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inspiration to all of us, and many of the results presented in this book owe to him. 


Preface 


Publication is a self-invasion of privacy. 
Marshall McLuhan 


Individuals in the Information Society want to safeguard their autonomy and retain 
control over their personal information, irrespective of their activities. Information 
technologies generally do not consider those user requirements, thereby putting the 
privacy of the citizen at risk. At the same time, the Internet is changing from a client- 
server to a collaborative paradigm. Individuals are contributing throughout their life 
leaving a life-long trace of personal data. This raises substantial new privacy chal- 
lenges. 


Saving digital privacy. By 2008, the European project PRIME (Privacy and Identity 
Management for Europe) had demonstrated that existing privacy technologies can 
enable citizens to execute their legal rights to control their personal information 
in on-line transactions. It had raised considerable awareness amongst stakeholders 
and has significantly advanced the state of the art in the areas of privacy and identity 
management. PrimeLife has been building on the momentum created and the results 
achieved by PRIME to address emerging challenges in the areas of privacy and 
identity management and really bring privacy and identity management to live: 


e A first, short-term goal of PrimeLife was to provide scalable and configurable 
privacy and identity management in new and emerging internet services and ap- 
plications such as virtual communities and Web 2.0 collaborative applications. 

e Asecond, longer-term goal of PrimeLife was to protect the privacy of individuals 
over their whole span of life. Each individual leaves a multitude of traces during 
a lifetime of digital interactions. Technological advancements facilitate extensive 
data collection, unlimited storage, as well as reuse and life-long linkage of these 
digital traces. 

e A third goal of PrimeLife was to support privacy and identity management by 
progressing the state of the art on 


— tools guaranteeing privacy and trust, 
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— the usability experience of privacy and identity management solutions, 
— security and privacy policy systems, and 
— privacy-enabling infrastructures. 


e The last but certainly important goal of PrimeLife was to disseminate our results 
and enable their use in real life. We organized interdisciplinary Summer Schools 
of Privacy, organized and participated in standardization groups and meetings, 
and made the source code and documentation of most of our prototypes and 
implementations available for free use. 


This Book. After more than three years of work in PrimeLife, this book aims at giv- 
ing an overview of the results achieved. It is therefore structured into an introduction 
and six parts covering the most important areas of privacy and identity management 
considering the life of today: Several aspects of “Privacy in Life” are discussed in 
Part I, followed by Part II “Mechanisms for Privacy”. Part II is dedicated to “Hu- 
man Computer Interaction (HCI)”, and Part IV to “Policy Languages”. Part V focuses 
on “Infrastructures for Privacy and Identity Management,” before Part VI “Privacy 
Live” comes full circle describing how PrimeLife is reaching out to the life of to- 
day’s and the future’s netizens. 
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Introduction 


This introduction of the book first discusses the need for privacy, PrimeLife’s vision 
and goals. It then elaborates on what privacy and identity are and why protecting 
on-line privacy is so hard. It finally summarises the results of the PrimeLife project 
that are presented in the different parts of this book. 


Chapter 1 
PrimeLife 


Andreas Pfitzmann, Katrin Borcea-Pfitzmann, and Jan Camenisch 


1.1 Motivation 


The Internet continues to be increasingly valuable to individuals, organisations and 
companies. Web usage for everyday tasks such as shopping, banking, and paying 
bills is increasing, and businesses and governments are delivering more services and 
information over the Internet. Users have embraced the Internet for social network- 
ing and substantial collaborative works have emerged including Open Source ini- 
tiatives, collaborative editing of encyclopedias, and self-help groups. Indeed, much 
of the information that is implicitly exchanged when people meet in person is now 
exchanged electronically over the Internet. 

Businesses have also recognised the collaborative potential of the Internet. Com- 
panies offer services for such efforts. Enabled by the ease and effectiveness of on- 
line collaboration, businesses are becoming virtualised and are adopting ad-hoc col- 
laboration and data-sharing. Information technologies have become pervasive and 
affect new areas of our daily lives. For example, a number of countries have or 
are about to introduce electronic identity cards and drivers licenses. Furthermore, 
electronic ticketing and tolling systems are in place all over the world. With the 
increasing number of communications systems, directories, personal information 
managers and social networks, the notion of sharing, viewing, and managing iden- 
tity information becomes an important part of every business and government. This 
issue is a fundamental concept and people will be forced to deal with it. 

Underlying all of these systems are distinct trust models and diverse trust re- 
lationships. Users and businesses rely increasingly upon information they find on 
the Internet — often without knowing anything about the originating sources. Thus, 
as a central part of their daily interactions, businesses as well as individuals need 
to manage not only their identity information but also trust information to assess 
their communication partners. For the safe future of the digital society, the concepts 
of privacy-enhancing user-centric identity and trust management are central. These 
concepts distinguish themselves from other notions of identity and trust manage- 


J. Camenisch et al. (eds.), Privacy and Identity Management for Life, 5 
DOI 10.1007/978-3-642-20317-6_1, © Springer-Verlag Berlin Heidelberg 2011 


6 A. Pfitzmann, K. Borcea-Pfitzmann, J. Camenisch 


ment by insisting that the user — and not some authority — maintains control over 
“what, where, when, why, and to whom” her personal information is released. This 
notion enforces user consent, which requires that (a) the user’s view of any trans- 
action corresponds to the actual transaction and that (b) the user agrees to the exe- 
cution of the transaction. For example, before a user logs into her banking website, 
she is told that she must prove (digitally) her name and birth date, for what purpose, 
and how her data will be treated. Then the user can either agree to this transaction 
and proceed or abort before her data is released. In user-controlled identity manage- 
ment, the user may moreover choose from many identity providers and also move 
her information between them. Thereby, important components are mechanisms for 
protecting a user’s privacy and anonymity, and yet simultaneously holding the user 
accountable if she commits fraud. 

In the FP 6 project PRIME, it was shown that privacy-enhancing user-controlled 
identity management is feasible in today’s Internet with prevalent commerce be- 
tween organisations (business, government) and consumers. The PRIME project has 
built and demonstrated the corresponding technology and shown how it enables pri- 
vacy protection for individual citizens. While this is sufficient for traditional server- 
client style transactions, the Internet has undergoing fundamental changes in multi- 
ple areas, many of which pose new challenges to privacy and identity management: 


Community Focus: The Internet fosters on-line community building. The main 
interactions here are between the members of the communities. Organisations act 
as intermediaries that store and serve data that is generated by individuals. There 
is daily evidence that these communities suffer from privacy and trust problems. 
Also, it has become common practice in companies to search the Internet for 
information about job applicants, which includes self-help and social networks 
such as Facebook and Linked-in. Thus, mechanisms are required that allow users 
to establish mutual trust while retaining their privacy, on the one hand, and to 
control the dissemination of their personal information, on the other hand. 

Mashup Applications: New Internet paradigms and technologies are emerging. 
One of the important trends is service composition by means of AJAX and 
mashup technologies (Web 2.0). The dynamic composition and the contents orig- 
inating from different, often questionable sources, makes it virtually impossible 
to assess the trustworthiness of contents — still, we all increasingly rely upon 
information retrieved from the Internet. Although the need for an authentica- 
tion and identification infrastructure for the emerging Internet to establish trust 
is widely recognised, it is completely unclear how to build it and let alone how 
privacy can be protected in this environment. 

Lifelong Storage Enabling Unlimited Collection: Storage has become virtually un- 
limited and cheap. Throughout our lives, we engage with a wide variety of dif- 
ferent communities and organisations and thereby use different roles. Our first 
interactions happen as children, assisted by our parents and teachers, later as 
grownups in various professional roles, as parents, and as elderly persons possi- 
bly again assisted by relatives or social workers. An increasing portion of these 
interactions is digitised and will be stored forever. The protection of our private 
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sphere and the management of our trust relations are paramount. Previous work 
mainly focused on transactional privacy, i.e., on privacy at a given moment in 
time, and it is unclear what mechanisms are required to ensure life-time protec- 
tion. 


Summing up, within the limits set by law, privacy, trust, and identity management 
should support the user in deciding who can access which of her attributes in which 
situations. Thus, identity management has to do with the entire lifespan of people. 
Identity management has to accompany us from birth to death (and beyond) and 
throughout all areas of our life, i.e. private life, business life and any mixture thereof. 
Identity management has to cover all means of interaction we use with our family, 
friends, colleagues, employers, and public administration. These interactions will 
increasingly be done through or even mediated by computer networks. 


1.2 Vision and Objectives of the PrimeLife Project 


We envision that users will be able to act and interact electronically in an easy and 
intuitive fashion while retaining control of their personal data throughout their life. 
Users might use a multitude of different means to interact with several partners 
employing a variety of platforms. 

For instance, a user Alice might authenticate to an on-line service created by a 
mash-up. She automatically logs on using her laptop and later confirms a payment 
transaction for an electronic article using her mobile phone. Despite many poten- 
tially untrusted services collaborating in this mash-up, Alice is required and able to 
reveal only the minimally necessary information to establish mutual trust and con- 
duct the transaction. No service will learn any personal information about Alice. 
Nevertheless, a merchant is guaranteed payment for the services. 

If Alice now wants to release personal data in an on-line dating service or in a 
network such as Facebook or MySpace, we envision that her personal sphere is pro- 
tected. Even in such ‘data-intensive’ scenarios, Alice can trust that given personal 
data will only be released to peers that Alice trusts to safeguard this data. This cor- 
responds to the real world where Alice releases sensitive data only if she trusts that 
the recipient properly protects it. So for instance, a future employer should not be 
able to access Alice’s entries in on-line dating or self-help forums since no sufficient 
trust has been established. 

When the PrimeLife project was conceived, a number of privacy enhancing tech- 
nologies such as private credential schemes and attribute-based access control al- 
ready existed. It was shown by the PRIME project that effective privacy-enhancing 
identity managements systems can be built from these technologies. However, on 
the one hand, these technologies were not applied yet. On the other hand, these 
technologies seem not to be applicable to the Web 2.0 paradigm where users want 
and need to provide lots of personal information and furthermore, do not take into 
account that people and their roles change over time. Therefore, PrimeLife set itself 
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three objectives: The PrimeLife project aimed to address all of these issues and set 
itself the following three objectives. 


e Research and develop new concepts, approaches, and technologies to protect pri- 
vacy for Web 2.0 applications, such as social networks and blogs, and for lifelong 
privacy protection and management. 

e Make existing privacy enhancing technologies useable and improve the state of 
the art. 

e Foster the adoption of privacy enhancing technologies by providing open source 
components and educational materials and by cooperations with standardization 
bodies and dedicated workshops. 


This book reports on the results that PrimeLife has achieved during its three years. 
In the remaining of this chapter we first discuss what we mean by privacy and then 
how we believe privacy can be protected in the digital society that is currently being 
build. We conclude this chapter by summarizing the results described in the different 
parts of this book. 


1.3 Defining Privacy 


The term privacy has been discussed for decades by different people in different oc- 
casions yet having (slightly) different meanings in mind. It has become a buzzword 
used nearly inflationarily. Numerous research papers try to analyse perceptions of 
privacy and to establish definitions for privacy (e.g., [Mar03, Par83, Phi04]). It is 
incontestable that privacy aims at protecting the autonomy of people, in the first 
place. Accordingly, Westin coined a widely accepted definition as follows: 


“Privacy is the claim of individuals, groups, or institutions to determine for themselves 
when, how, and to what extent information about them is communicated to others.” [Wes67] 


In [BPPB11], the authors discuss the different stages of the historical development 
of privacy understanding and dealing with issues of privacy, which are the basis for 
the following discussion: In order to enable people to protect their privacy, technical 
as well as legal foundations were, and still are, required to be laid. Consequently, 
means to enforce confidentiality, i.e., hiding private information of an individual 
from unauthorised others, as well as data minimisation, i.e., limitation of storage 
and processing of personal data, entered the spotlight of researchers, developers, 
and regulators. In last two decades, the requirement of data minimisation became 
part of European legal regulations, cf. the data minimisation principle (Directive 
2002/58/EC) and the purpose binding principle (Art. 6 (1b), Directive 1995/46/EC). 


These days, many Internet users extend their social lives into the digital part of 
the world. This, however, conflicts with the traditional approaches of protecting pri- 
vacy, namely confidentiality and data minimisation, as socialising, i.e., connecting 
to, communicating, and collaborating with each other, always means revealing at 
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least some personal information about oneself. Consequently, keeping personal data 
confidential or minimising data is socially not acceptable in every situation. 

With respect to privacy, this means that people need to constantly evaluate ap- 
propriate trade-offs between protecting information about themselves on the one 
hand, and releasing as much personal information as it is reasonable for socialising 
on the other hand. Such trade-offs have two essential characteristics: Firstly, they 
have to be decided on for and by every user individually, i.e., in order to take con- 
trol of their personal information, all users needs to determine for themselves how 
their privacy is established. Secondly, the trade-offs are highly context-dependent. 
In other words, the selection of particular personal information to be revealed to 
others is conditional on the actual situation in which (current) activities take place, 
i.e., it depends on variables such as: 


e who are the nearby actors (users that are potential or current interactors of the 
reference user); 

e what exactly is the to-be-performed activity of the user in question — the reference 
user 
time of activity; 
frequency of activities; 
more specific properties, e.g., namespace of a Wiki page etc. 


As described within the motivational section (cf. Section 1.1), privacy-enhancing 
identity management is one of the essential means of coping with the challenges that 
recent developments in the IT field pose to the users’ privacy (as discussed above: 
Communities, Mashup Applications, and Lifelong Data Storage). The PrimeLife 
project, as well its predecessor PRIME, took up these challenges to research and 
develop solutions that enable users to retain control over their personal information. 
The special focus of the PrimeLife project, the results of which are being described 
in this book, is on ensuring that citizens, i.e., the general public, adopt developed 
technology covering lifelong privacy as well as privacy-enhancing identity manage- 
ment. 


1.4 From Identity via Identity Management to Privacy by 
Identity Management 


The concepts of identity and identity management deal with describing someone’s 
personality. But, what are the data subjects! that can have an “identity”? Almost ev- 
eryone has natural persons in mind when referring to identities. However, an identity 
could also point to a legal person, of a group of users, or even to a computing de- 
vice such as a laptop or mobile phone. The latter is true when, e.g., a person (let’s 
call him Bob) takes a phone with him all the time. In this case, if Bob would allow 


' By data subjects we refer to entities being able to interact via communication infrastructures with 
other entities, i.e., natural and legal persons as well devices used to represent them in interactions. 
Sometimes, even sets of persons are called data subjects. 
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others to use the location tracking service of his phone, they could track at which 
times he moves where. This little example shows very well the need for identity 
management, not only for users, but also for the computing devices that represent 
them or even act on their behalf. 


1.4.1 Identity — What It Is 


The authors of [PBP10] conducted a structured analysis on the concepts of identity 
and privacy-enhancing identity management. Though the overall aim of their paper 
is to display the specifics of lifelong privacy and what consequences that setting has 
for privacy-enhancing identity management (see also Section 1.5), the basics of the 
concepts of identity and privacy-enhancing identity management are comprehen- 
sively outlined. The following statements put that analysis into the context of this 
book. 

The concept of identity is only vaguely clear to most people. It relates not only 
to names, which are easy to remember for human beings, but goes far beyond iden- 
tifiers, which typically link an identity to a certain context and which usually are 
unique in that context. As an initial approach, the notion of identity is described as 
follows: 


Identity is a set of attribute values related to one and the same data subject. 


Specifics and properties of identity-related attributes will be discussed later (cf. Sec- 
tion 1.4.2.1). Nevertheless, we allude here to attribute values being determined ei- 
ther by the identity holder himself or by others. Considering the earlier mentioned 
control over personal information by the owner, it is essential to consider this differ- 
ence: differentiating between the two kinds of attribute assignment is crucial with 
respect to the possibilities of privacy management one has. 

Considering time aspects, we have to extend the above introduced definition. Ac- 
cordingly, attribute values used to specify an identity may change over time. Since 
all values an identity-related attribute can take are essential to describe the iden- 
tity of its data subject, it is necessary to add a timestamp to each attribute value 
for which that attribute value is valid.2 And, following this train of thought, we can 
further state: 


An identity as a set of attribute values valid at a particular time can stay the same or grow, 
but never shrink. 


This is true both for a global observer as well as for each party (or a set of parties) 
interacting with the entity represented by the identity. Therefore, if an unauthorised 
entity (a potential adversary) has no access to the change history of each particular 
attribute, the fact whether a particular subset of attribute values of an entity is an 


2 A valid attribute value means that it is used to represent its holder in a given setting. 
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identity? or not may change over time. If the adversary has access to the change 
history of each particular attribute, any subset of attribute values forming an identity, 
which sufficiently identifies its holder within a set of data subjects, will form such an 
identity from the adversary’s perspective irrespective how attribute values change. 

Any reasonable adversary will not just try to figure out attribute values per se, 
but the points in time (or even the timeframes) they are valid (in). This is because 
such change histories enable linking data and, thereby allowing the adversary to 
infer further attribute values. Therefore, it may help to define each attribute in such 
a way that its value(s) cannot become invalid. For example, instead of the attribute 
location of a particular individual person, take the set of attributes /ocation at time 
x. Depending on the inferences one is interested in, refining that set as a list ordered 
concerning location or time may be helpful. 


1.4.1.1 Partial Identities 


Bearing in mind that identities usually grow over time and the probability of iden- 
tification of the entity within the given subset of entities usually grows as well, a 
solution is needed to solve this privacy-related dilemma, i.e., to preserve the en- 
tity’s privacy. The idea is to subset the identity of an individual, the result of which 
should be a possibly very large set of so called partial identities [PH10]. Thereby, 
each partial identity may have its own name, own identifier, and own means of au- 
thentication. In a certain sense, each partial identity might be seen as a full-fledged 
identity of someone or something. 

Selecting the appropriate approach for subsetting the attribute values is crucial, 
as it determines whether the establishment of partial identities is reasonable. Obvi- 
ously, if subsetting is done poorly, e.g., by using identifying attributes within a major 
part of the entity’s partial identities and, thus, allowing to link different partial iden- 
tities to one and the same entity, it will not solve the privacy-related dilemma and 
it only makes the life of the respective person more complicated. Consequently, the 
right tools have to be used and subsetting one’s identity has to be done in the right 
way. This does not only help the person whose identity is under consideration, but 
also the people communicating with her or him. That is because partial identities 
should consist of only those attribute values which are really needed within that 
particular relationship or context. 

Figure 1.1 shows a snapshot of a person’s (“John”) possible partial identities in 
different contexts. The dark-grey areas represent different partial identities of John 
overlapping with parts of his full identity, represented by the light-grey area. While 
one may assume that this full identity, as well as its partial identities, are related 
to John’s activities in either the online sphere or the physical sphere, activities may 
also spread to the other sphere. 


3 According to [PH10], an identity is defined as “any subset of attribute values of an individual 
person which sufficiently identifies this individual person within any set of persons” or more ex- 
plicitly: “An identity is any subset of attribute values of an individual person which sufficiently 
distinguishes this individual person from all other persons within any set of persons.” 
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Fig. 1.1: Partial identities of an individual (“John”) [HBPPOS]. 


The authors even assume that it is difficult to say if there will be any differentia- 
tion between those two “spheres” in the next 50 or 100 years. Ambient intelligence 
and ubiquitous/pervasive computing might make the boundaries blur and eventually 
disappear. This means that differentiating between identity-related data of the on- 
line and of the physical spheres might not make sense anymore. To conclude, when 
looking into the future, subsetting the identity/ies is absolutely essential whenever 
one strives for privacy. 


1.4.1.2 Requirements for Using Partial Identities 


Taking advantage of partial identities requires a basic understanding of that concept 
by the data subject concerned. Of course, communication partners such as govern- 
ments and businesses have to understand it as well, since managing one’s (partial) 
identities makes sense only if the interacting entities are willing to accept it. 

Further, the authors assume that every person possess at least one computer (or 
some other device able to execute the according computations) that is administrating 
the person’s personal data and executing cryptographic protocols. Thereby, this per- 
sonal computer is fully controlled by the user (otherwise there is no way to validate 
privacy properties).* 


4 This is in contrast to the typical digital rights management (DRM) scenario where the users have 
very limited control over their devices and the data processed by them. The authors are fully aware 
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By having a large set of (partial) identities, each of these (partial) identities needs 
its own means for secure authentication (otherwise there is no way to achieve ac- 
countability). This can be fulfilled by digital pseudonyms, which are similar to tra- 
ditional (cryptographic) public keys and offer the strong privacy properties that we 
advocate. Besides allowing for secure authentication, digital pseudonyms represent 
unique identifiers of the respective (partial) identity and which are used to authenti- 
cate (sign) items originated by the holder in a way that recipients can check it (based 
on [PH10]). 

Last but not least, means are required that enable a person to transfer certified 
attributes between her different partial identities. Therefore, it is important that this 
process by itself does not reveal that the different partial identities relate to the 
same person (unless the shared attributes uniquely identify the person). This transfer 
of certified attributes can be achieved by anonymous credentials which has been 
introduced by David Chaum in [Cha85] and a number of practical implementations 
are known today (cf. Chapter 5). 

Indeed, anonymous credentials represent the appropriate basis for sharing cer- 
tified attributes between partial identities of the same entity. Without anonymous 
credentials, the applicability of partial identities would be severely reduced. 


1.4.2 Presentation of Identities — Pseudonyms 


Considering the use of partial identities in particular, one has to be aware that, first, 
partial identities have to be consciously created and established. Secondly, usage 
patterns of the partial identities* drive linkability of the attribute values and, thus, 
the conclusions that could be inferred. This means that users should partition their 
online activities — a systematic approach of partitioning according to (disclosure) 
contexts of activity is called context management [BPPB11]. 

Identities or partial identities of an entity are represented using (digital) pseudo- 
nyms. Those pseudonyms serve as identifiers of the (partial) identities, on the one 
hand, and as addresses of the (partial) identities, on the other hand. In order to in- 
dicate holdership of a (partial) identity, an explicit link between the pseudonym and 
the holder of the attributes of that (partial) identity has to be created. Different kinds 
of initial linking between a pseudonym and its holder can be distinguished [PH10]: 


that assuming that everyone has a computer fully under their control today is a very daring state- 
ment. However, when people talk about secure e-commerce, they assume the same. So, as there 
are “major commercial forces” striving for that direction, it could be expected that the assumption 
the authors have made will become a more realistic one during the next 20 years. 

> When referring to usage patterns of partial identities, we address different aspects, e.g., how 
frequently a partial identity is communicated; how fine-grained is the context defined in which the 
partial identity is used; what rules are applied when selecting a particular partial identity for an 
interaction. 
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e Public pseudonym: The linking between a pseudonym and its holder may be 
publicly known from the very beginning, e.g., the phone number with its holder 
listed in public directories. 

e Initially non-public pseudonym: The linking between a pseudonym and its holder 
may be known by certain parties (trustees for the according identity), but it is not 
public initially, e.g., a bank account with the bank as trustee for the identity. 

e Initially unlinked pseudonym: The linking between a pseudonym and its holder is 
— at least initially — not known to anybody (except to the holder), e.g., biometric 
characteristics such as DNA (as long as it is not stored in a DNA register). 


As previously mentioned, according to the usage patterns of partial identities and 
their pseudonyms, various types of pseudonyms can be distinguished. The differ- 
entiation of pseudonyms is closely related to different levels of anonymity that are 
achievable by the usage patterns. 
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identity card number 
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Fig. 1.2: Pseudonyms — Use in different contexts leading to partial order (based on 
[PH10]). 


Figure 1.2 illustrates that interrelation. According to this, person pseudonyms, 
i.e., names or identifiers directly identifying a real person, imply the highest de- 
gree of linkability and, thus, they offer the least-possible anonymity to their holders. 
Examples for such kinds of pseudonyms are numbers of identity cards or the well- 
known social security number, which are used with very diverse communication 
partners and in very manifold contexts. Further, they typically are associated with 
their holders over their entire lifetime. This means, each time a user communicates 
by indicating her/his person pseudonym, all of that person’s activities could poten- 
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tially be linked together. As a result, quite a detailed profile describing that person 
could be created. 

In comparison, role pseudonyms and relationship pseudonyms are pseudonyms 
used within particular contexts only. Thereby, a role pseudonym is used by its holder 
when acting in a certain role. An example for role pseudonyms are pen names. Sim- 
ilar to role pseudonyms are relationship pseudonyms. They refer to entities within 
particular relationships, e.g., a pseudonym denoting someone in his or her relation- 
ship to a sports club. In this case, it does not matter if the person represents him- 
or herself in either of the roles, trainer or athlete. So, the two pseudonym types are 
distinguished according to the following rules: Whenever a pseudonym specifies a 
person communicating with specified other entities, then we speak of a relationship 
pseudonym. Instead of this, whenever users specify as what(vhom they commu- 
nicate, they are using role pseudonyms. Linkability is, therefore, restricted to the 
activities performed within the given relationship or when acting in a particular role 
and using the relevant pseudonym. 

Even more privacy in terms of less linkability and stronger anonymity can be 
reached with help of role-relationship pseudonyms. The increase of conditions used 
in a particular relationship while appearing in a special role (e.g., appearing in the 
relationship to a particular sports club and either in the role as a trainer or as an 
athlete), narrows the variety of a scenario where one and the same pseudonym is 
used. In order to preserve privacy and to enable long-term communication with 
others, more role-relationships (and partial identities) have to be created for more 
specific contexts. 

If the goal is to have the least linkability and utmost anonymity when com- 
municating via a computer network, one has to make use of transaction pseudo- 
nyms, implying one-time use of pseudonyms. Linkability of different actions of the 
pseudonym holder via the pseudonyms only is not possible any longer since the user 
would create a new pseudonym for each interaction that is visible outside the user’s 
personal computer. 

The classification of pseudonyms as given above is a rather rough means to con- 
tribute to tool development supporting the user in decision making with respect to 
the selection of pseudonyms or partial identities. If a user decides, e.g., to re-use a 
pseudonym that initially was created to be used only once (i.e., for one transaction 
only), it will lose its property of a transaction pseudonym. 


1.4.2.1 Important Kinds of Attributes 


When looking at attributes of (partial) identities, we can observe several kinds of 
attributes, each of them requiring a particular degree of protection when striving 
for privacy. In addition to the already mentioned attribute types name, identifier, 
and means of authentication, we distinguish biometrics, addresses (used for com- 
munication), bank accounts, credit card numbers etc. To a large degree, all of these 
are used for uniquely identifying entities. Biometrics as one of these attribute types 
has represented a well-known concept of the physical world used for identifying 
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persons for hundreds of years. However, biometrics being stored and evaluated by 
computers is relatively new. Biometrics can be helpful to bind computing devices to 
a natural person. But, it can also be critical if it is used in contradiction to privacy 
attitudes of people. 

With respect to classifying identity-related attributes, there are different possibil- 
ities: 


e One of the main distinctions that can be made with respect to attributes is if they 
are authenticated. If so, there are two possibilities regarding who authenticated 
the attribute: The first option is that they are authenticated by the first party — 
the data subject. In this case, it would be a claim that the data subject makes 
about her/himself and the claim would be as trustworthy as the data subject is 
trustworthy. The second option refers to authentication by a third party. Here, 
the authors explicitly do not refer to a trusted third party since the following 
two questions are to be clarified for each situation individually: Is the third party 
trusted by whom and with respect to what? 

e Another approach of classification refers to who knows the attribute value, i.e., 
is the attribute value known only to the first party (the data subject) or also to 
second parties, i.e., the data subject’s communication partners, or even to a third 
party that the first and second parties might not be aware of? 

e Attributes can be classified according to the degree of changeability. Could at- 
tribute values be changed easily or is this hard to do? What possibilities does the 
entity have to change the attribute value? 

e Variability of attributes over time is also a possible classification whereby this 
could range from non-varying to fully varying. In this context, it may also be 
interesting whether changes of attribute values with respect to when and what 
can be predicted? 

e Attributes can be distinguished according to who defines the attribute values, 1.e., 
are the attribute values given to the data subject by an external source or did 
the data subject her/himself choose the attribute values? This difference plays a 
special role in discussions of the possibilities of user control (cf. Section 1.4.1).° 

e Further classification of attributes could be the actual information the attribute 
value contains. So, are we talking about pure attributes, whereby the attribute 
values contain only information about themselves, or do the attribute values also 
contain significant side information?’ 

e Also, attributes can be classified according to the relationships the data subject 
has. One could ask if an attribute value characterises a single entity per se or 
an entity only in its relationship to other entities, e.g., entity A likes/loves/hates 
entity B. 

e Sensitivity of attribute values in certain contexts can be seen as an additional 
means to classify attributes, though this might be a very subjective approach. 


© To give an example: if we refer to the attribute colour of hair, then its value can be a given (natural 
hair colour) or a chosen (after chemical dyeing) attribute. 

7 Let us assume we use biometrics, i.e., an image of someone’s face available in a high resolution. 
From this, some doctors possibly may conclude some diseases. 
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However, if considering long-term use of attributes, then attributes judged to be 
non-sensitive today may become quite sensitive in future times (such as a possi- 
ble change of the social order). 


From those approaches of classification, conclusions can be drawn regarding how 
much protection attributes or attribute values need. Supposedly, some attribute val- 
ues need much more privacy protection than others, e.g., those which 


are not easy to change,® 

do not vary over time or can be predicted, 

are given attribute values, 

might contain significant side information,’ or 

are sensitive or might become sensitive in at least one context. 


These attribute values are part of the core identity. Of course, it would be nice to 
protect everything. But, to be realistic, this is almost impossible — especially in sit- 
uations where socialising is intended or even required. When one starts to manage 
identity attributes, one has to determine what defines her or his core identity: what 
attributes really belong to that core identity and need, therefore, relevant protection? 
Advancements and use of technology may shift some attributes from “core iden- 
tity” to “non-core identity”; e.g., the address of someone’s house or flat is core for 
him/her, the current address of your laptop may not be. 


1.4.3 Time Aspects of Identity Management and Privacy 


Another interesting aspect one should consider when dealing with privacy issues 
is time-related aspects, which will be given as a first overview in the following 
subsection and in more detail in Chapter 4. 

The design of privacy-preserving solutions and especially those aiming at privacy- 
enhancing identity management must not stop at supporting the user in managing 
her/his present identities. Instead, since any kind of privacy intrusion may have im- 
plications on the individual’s future life, it is necessary that the issues related to 
longterm aspects of privacy-enhancing identity management are identified and un- 
derstood. 

Controlling the disclosure of one’s personal data throughout his/her entire life 
comprises a timeframe of nearly 100 years and, seen from the current moment 
of time, it takes the past, the present, and the future of the person concerned into 
account. During that timeframe, an individual’s world can change drastically, i.e., 


8 To give an example of the necessity to protect those attributes, imagine biometrics becoming 
widely known. Then, it might become necessary, but be very hard, to change the biometrics (which 
could mean, e.g., handing out new fingerprints to everybody). In comparison, cryptographic keys 
can easily be revoked and new ones generated. 

° Nobody knows which algorithms for analysis of side information will become available during 
the next years. 
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information and communication technology develops and each individual’s appre- 
ciation of privacy will change several times in her or his life. 

In the authors’ opinion, it is difficult, if not impossible, to make data fade away. 
Each time a user uses the Internet, s/he creates traces. What s/he cannot do is reli- 
ably cause data to be destroyed on other persons’ or organisations’ machines. This is 
a very important issue to be considered in this context. Accordingly, we need mech- 
anisms that can realise the above mentioned privacy-related concepts. In the first 
place, hiding should be given priority over disclosing data. For this, identity man- 
agement and user control are the right means. Also, it is essential to have assured 
long-term security by using information-theoretically secure cryptography [PBP10]. 
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Personal data around 
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Fig. 1.3: Example of how partial identities develop throughout life and in various 
areas of life (based on [CHP™ 09). 


As indicated, a person’s attitude regarding his/her privacy will change as s/he runs 
through various phases of life and acts in different areas of life. Figure 1.3 is an 
attempt to depict disclosures of personal data during an individual’s lifetime, which 
has been sketched in [CHPt09, HPS08]. Usually, even before a human being is 
born, a lot of personal data about the unborn child is gathered. Such gathering con- 
tinues during a human being’s life. The data is stored with various data controllers 
involved as individual partial identities. Finally, when the person passes away, the 
evolvement of the (partial) identities is terminated. But, termination does not mean 
that the data disappears. In many cases, the data will be stored further, e.g., in back- 
ups. 

To conclude, the management of lifelong privacy means (1) covering the full 
lifespan by considering short-term as well as long-term effects; (2) covering all 
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areas of life by addressing context-specific as well as context-spanning aspects; (3) 
covering different stages of life by respecting constant as well as changing abilities 
or behaviour of individuals. 


1.5 Further Facets of Privacy 


Besides the basic concepts and ideas related to privacy and discussed in this chapter, 
privacy combines further aspects. A selection especially interesting in the context 
of the PrimeLife project will be explored in the following. 

Initially, we indicated that people have different understandings of the concept 
privacy whereby this actual fact primarily can be attributed to the historical devel- 
opment of privacy perceptions or to focal points of work or social lives of people.!° 
The following list indicates main concepts related to privacy, which are sometimes 
put on the same level as privacy (note that these items are not to be understood as 
being on equal footing): 


e Related to direct identifiability (the concepts have a direct relation to the identi- 
ties and identifiability of persons): 


— Anonymity in the meaning of non-identifiability of an entity within a certain 
set of entities [PH10]; 

— Pseudonymity in the sense of being recognisable in selected situations while 
(re-)using particular pseudonyms. 


e Related to indirect identifiability (the concepts relate to the personal data of en- 
tities possibly used to (not) identify the entity concerned): 


— Confidentiality — hiding private things from unauthorised users by establish- 
ing a private sphere. Confidentiality can be achieved using cryptographic al- 
gorithms applied to the communicated data itself or to the communication 
channel; 

— Data minimisation primarily used within legal contexts refers to limiting the 
storage and processing of personal data: Data minimisation is, in the first 
place, a legal means to assure privacy, but it should also be one of the ba- 
sic principles of application developers when designing social software, i.e., 
applications have to be designed in such a way that processing personal data 
needs to be kept to a minimum and users should be offered different options 
that allow for data minimisation; 

— User control relates to users who determine themselves which of their per- 
sonal data they disclose to whom in which situation. 


e Contextual integrity as a privacy denotation is not yet a common idea. Despite 
the term’s novelty, it refers to two privacy-related understandings: The first was 


10 Often, even researchers equate privacy with, e.g., anonymity. 


20 A. Pfitzmann, K. Borcea-Pfitzmann, J. Camenisch 


coined by Helen Nissenbaum in [Nis98]. In order to protect privacy in Nis- 
senbaum’s understanding, personal data must not leave the originating social 
context. This approach is similar to the already indicated ideas of minimising per- 
sonal data. The second understanding has been discussed by Borcea-Pfitzmann 
et al in [BPPB11]. Their idea is to allow personal data to be distributed and pro- 
cessed — under the condition that data describing the context from where the 
personal data originates is transferred together with the actual personal data. 


This list of privacy-related concepts is neither intended to be complete nor to rank 
those concepts. Instead, it should give the reader a insight into the different views on 
and approaches to privacy that researchers, developers and users have to deal with. 


1.6 How to Protect Privacy and PrimeLife’s Contributions 


Protecting one’s digital privacy involves foremost managing and controlling the re- 
lease and dispersal of one’s personal information. Of course, we cannot hope to 
completely control information about ourselves in an open society and, in particu- 
lar, when using open communication media such as the Internet. Worse, today we 
have essentially lost control over our information, its use and dispersal by others, be 
it by corporations for profit, by other users for fun, or by criminals for identity fraud 
and other misconducts. 

Regaining control over our personal data is not only important to protect our- 
selves from fraud but also it also required for our future society and marketplace to 
flourish: A democracy cannot work, e.g., without elections where citizens are guar- 
anteed anonymous votes, and in markets where information itself becomes a good, it 
is essential that information be protected, and its use and dispersal can be governed 
by its owner. 

While there is some legal and social protection of our privacy, the best way to 
protect our privacy is not to reveal any information about ourselves at all — which 
of course does not work: we needed to provide information about ourselves when- 
ever we want to interact with other people or service providers or use applications 
over networks. We advocate, however, that the information we need to reveal in 
interactions be minimised, on the one hand, and, once revealed, that the informa- 
tion be protected and its use be governed by policies and access control systems, 
on the other hand. This requires electronic services and applications to be “privacy 
by design.” It is not sufficient that the designers and developers are privacy minded 
but they also need to know what privacy-enhancing technologies can offer, how it 
works, how it can be employed, and to be provided these technologies. Privacy- 
enhancing technologies include the classical ones such as onion routing protocols 
to anonymous communication, private credential systems and minimal disclosure 
tokens, and database anonymisation techniques. They also include suitable access 
control mechanisms and policy languages, infrastructure components and, most im- 
portant, user interfaces that enable the users to execute their control. 
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The technologies known before PrimeLife started, primarily focused on single 
transactions, how to minimise the information released in these transactions, and 
how to protect this information once it was released. However, these approaches do 
not take into account the fact that people and their roles change over time, nor do 
they fit the paradigm shifts of the so-called Web 2.0, where people use the Internet 
quite differently and share lots of personal data on social networks. 

As we have already mentioned, the PrimeLife project aimed to enable users to 
protect their privacy in a digital society by the following three objectives. 


e Research and develop new concepts, approaches, and technologies to protect pri- 
vacy for Web 2.0 applications, such as social networks, and collaborative appli- 
cations, and for lifelong privacy protection and management. 

e Make existing privacy enhancing technologies useable and improve the state of 
the art. 

e Foster the adoption of privacy enhancing technologies by providing open source 
components and educational materials and by cooperations with standardisation 
bodies and dedicated workshops. 


The work towards these goals was structured in six activities, each consisting of 
a number of work packages (WPs). The first activity (Privacy in Life) addressed 
the first objective and produced a number of new mechanisms to protect privacy 
in social networks and Internet forums. It also studied the problem of access to 
personal data and the delegation thereof. 

Activities 4 (HCI), 5 (Policies), and 6 (Infrastructure) addressed the second ob- 
jective. In particular, Activity 5 was concerned with developing policy languages 
and access control systems for attribute-based access control enabling the use of 
anonymous credentials or minimal disclosure tokens for authentication and hence 
privacy-enabling access control. Activity 6 studies the infrastructure requirements 
for privacy-enhancing identity management and access control and how one can 
have the infrastructure changed towards this. Activity 4 researched and developed 
user interfaces for the different mechanisms that the project developed so that they 
become usable by the end users. 

Activity 2 (Mechanisms) could be seen as coming up with new mechanisms and 
improving the existing ones as needed by all the other activities and has indeed pro- 
duced an impressive number of new research results and prototypes. Finally, Activ- 
ity 3 (Privacy Live!) was concerned with fostering the adoption of the technologies 
produced by PrimeLife. To this end, a number of workshops were held, contribu- 
tions to standardisation bodies made, and many of the technologies were provided 
as Open source components. Also, the book you are holding in your hands is a result 
of this activity. 

The different parts of this book present the results of the different activities of 
PrimeLife. The following section provides brief summaries of these results per ac- 
tivity. 
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1.6.1 Part I - Privacy in Life 


It was a goal of PrimeLife to provide sustainable, scalable and configurable privacy 
and identity management for an individual in various situations of his life. 

The first focus of the research in this area was on assisting users in new and 
emerging Internet services and applications such as virtual communities and Web 
2.0 collaborative applications. This was achieved by setting up demonstrators show- 
ing how audience segregation for data can be done as documented in Part I, Chap- 
ter 2. Additionally in Chapter 3, we describe trust mechanisms to help users decide 
on trustworthiness in data delivered by others with all users concurrently having 
privacy requirements on the data they request and deliver. The scenarios we chose 
for Chapters 2 and 3 cover a broad bandwidth of Web 2.0 applications (blogs, wikis, 
social networks, forums). The prototypes we built for these scenarios served as a 
basis for experiments for finding out which indicators raise users’ awareness with 
respect to data privacy and trustworthiness. 

The second focus of research was on the life-time aspect of privacy a user faces as 
we will outline in Part I, Chapter 4. Each individual leaves a multitude of traces dur- 
ing a lifetime of digital interactions. While parts of these traces are unconsciously 
left behind and not meant as important information, lots of information is very rel- 
evant and important for specific domains. This information, containing information 
about the individual, but also all sorts of documents, has to be accessible by some- 
one at any time. This implies that, in case an individual is not able to manage this 
information, others need to obtain access. As exemplary demonstrator, we built a 
backup tool that takes into account new ways of interacting. The tool provides for 
the backup and synchronisation of documents. With a view on the specific require- 
ments concerning lifelong privacy and identity management, mechanisms are built 
in to safeguard individuals, to improve control over the data and to allow for dele- 
gation. 


1.6.2 Part II - Mechanisms to Enable Privacy 


Today’s society places great demand on the dissemination and sharing of informa- 
tion. Such a great availability of data, together with the increase of the computational 
power available today, puts the privacy of individuals at great risk. The objective of 
the mechanisms activity is therefore to do novel research on the different open issues 
of the complex problem of guaranteeing privacy and trust in the electronic society. 
Chapter 5 focuses on privacy-enhancing cryptographic technologies that can be used 
in practice. The chapter presents anonymous credential schemas and their exten- 
sions along with cryptographic applications such as electronic voting and oblivious 
transfer with access control. Chapters 6 and 7 addresses mechanisms supporting the 
privacy of the users (transparency support tools, privacy measurement) and their 
electronic interactions. In particular, Chapter 6 illustrates a privacy-preserving se- 
cure log system as an example of a transparency supporting tool and an interoperable 
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reputation system. Chapter 8 investigates the problem of assessing the degree of pro- 
tection offered by published data and of protecting privacy of large data collections 
that contain sensitive information about users. The chapter presents an information 
theoretic formulation of privacy risk measures and describes fragmentation-based 
techniques to protect sensitive data as well as sensitive associations. Chapter 9 ad- 
dresses the problem of providing users with means to control access to their infor- 
mation when stored at external (possibly untrusted) parties, presenting new models 
and methods for the definition and enforcement of access control restrictions on 
user-generated data. The chapter illustrates a novel solution based on translating 
the access control policy regulating data into an equivalent encryption policy deter- 
mining the keys with which data are encrypted for external storage. The solution is 
complemented by an approach based on two layers of encryption for delegating to 
the external server possible updates to the access control policy (without the need 
for the data owner to re-encrypt and re-upload resources). 


1.6.3 Part LI - User Interfaces for Privacy Management 


Privacy-enhancing Identity Management will only be successful if its technologies 
are accepted and applied by the end users. For this reason, the research and devel- 
opment of user interfaces for PrimeLife technologies, which are intelligible, user- 
friendly while compliant with legal privacy principles, and which are mediating 
trust, have played an important role in the PrimeLife project and have been ad- 
dressed by PrimeLife Activity 4 (HCI). The main research achievements of Activity 
4 that will be reported in this book can be structured by the Activity 4 work pack- 
ages: 

The first three chapters of Part HI of this book report the main research results on 
novel HCI methodologies for PETs, on mental models and metaphors for privacy- 
enhancing identity management and has helped to develop and evaluate UIs for 
PrimeLife prototypes. Chapter 10 reports on PET USES - the Privacy Enhancing 
Technology Self-Estimation Scale which we have developed and used within Prime- 
Life for evaluating user interfaces for privacy enhancing technologies. Chapter 11 
discusses the HCI development process and testing of PrimeLife prototypes. Chap- 
ter 12 describes a series of mockups for anonymous credential selection based on 
a card metaphor and analyses what effects the users’ mental models have on their 
understanding of the selective disclosure property of anonymous credentials. In par- 
ticular, we investigate and compare the effects of the mental models of a card-based 
user interface approach and an attribute-based user interface approach. 

Chapter 13 considers Trust and Assurance HCI and investigates UI mechanisms 
to provide and communicate trustworthiness of Privacy and Identity Management 
technology to the end users. For this, the iterative design process of a trust evalu- 
ation function is described, which allows end users to evaluate the trustworthiness 
of services sides in terms of their business reliability and their privacy practices. 
Transparency can be a means for enhancing the users’ trust in PrimeLife technolo- 
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gies. Chapter 13 also describes the iterative user interface design of the data track, 
which is a user-friendly transparency tool consisting of a history function document- 
ing what personal data the user has revealed under which conditions plus online 
functions allowing a user to exercise her right to access her data at remote services 
sides. 

Our research on User Interfaces for Policy Display and Administration has elabo- 
rated user-friendly and legally compliant forms of privacy policy definition, admin- 
istration, negotiation, and display. Chapter 14 presents our work on user interfaces 
for a simplified management of privacy preferences and for a user-friendly display 
of data handling policies of services sides including information about how far they 
match the user’s privacy preferences. PrimeLife’s work on privacy policy icons for 
presenting policy components in a very visible and intuitive manner is presented in 
Chapter 15. 


1.6.4 Part IV - Policies to Control Privacy 


Machine-interpretable policy languages are a key part of any modern privacy infras- 
tructure. PrimeLife set out to collect policy language requirements from the diverse 
scenarios covered by the project, which are summarised in Chapter 16. After an 
analysis of the suitability of existing policy languages, it quickly became clear that 
none of them covered all the needs we discovered. 

The main highlight of the policy activity is the specification and implementation 
of the PrimeLife Policy Language (PPL), an integrated policy language allowing 
data controllers to express which data they need from data subjects and how this 
information will be treated, and at the same time allowing data subjects to express 
to whom and under which conditions they are willing to reveal their information. 
Chapter 17 focuses on the relation between access control policies and data han- 
dling policies, and describes an automated matching procedure by which means a 
proposed policy can be automatically matched against a data subject’s preferences. 
Chapter 18 introduces privacy-friendly access control policies by proposing the con- 
cept of “cards” as a generalisation of several existing authentication technologies, 
including anonymous credentials. Chapter 20 reports on the architecture and imple- 
mentation of the PPL engine that brings the advanced research concepts to life. 

Chapter 19 takes a closer look at the legal requirement under European law to 
transparently inform users about the usage of their information. Expressing such 
usage in an understandable way is a notorious challenge. Faced with the multitude 
of applications and usage purposes and with the lack of a structured ontology among 
them, this chapter investigates the current practices in data usage in various contexts 
and discovers a common structure. 

Finally, some of the most important concepts of the performed policy research, in 
particular the results presented in Chapters 17 and 18 were brought together in the 
design of the PrimeLife Policy Language (PPL). To be of use in real-world settings, 
PPL was defined as extensions to the industrial standards XACML and SAML. 
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1.6.5 Part VI - Infrastructures for Privacy 


While infrastructure aspects have a significant impact on the adoption, security and 
privacy functionality of IdM systems in general, and of privacy enhancing identity 
management systems in particular, they are often overlooked. One reason is the 
complexity of the infrastructure aspects, as often every element has some relation 
to every other element of an infrastructure, and identity management infrastructures 
must be interoperable among themselves or with existing legacy solutions. 

Part VI concentrates on the three most relevant aspects in infrastructures and 
infrastructure research: 


1. Privacy for service-oriented architectures: How can privacy be integrated into 
service-oriented architectures (SOA) that define more and more aspects of the 
Internet-based business? Chapter 21 first lists legal and technical requirements 
for privacy in service-oriented architectures. These requirements form the start- 
ing point for a technical framework that brings privacy enhanced data handling 
to multi-layered, multi-domain service compositions. Further, an abstract frame- 
work is described that is technology agnostic and allows late adoption also in 
already existing SOA applications. 

2. Smart mobile devices: Technologies and future directions for innovation. Chap- 
ter 22 elaborates upon the existing and upcoming technologies for an increasingly 
dynamic creation of services between front-end mobile devices and back-end 
servers and sketches how a conscious inclusion of security, identity-management 
and privacy-enhancement can be achieved in the future. 

3. Privacy by sustainable identity management enablers: To optimise sustainabil- 
ity, an economic valuation approach for telco-based identity management en- 
ablers is developed. This will enable telecommunications operators to learn the 
relevant factors for assessing privacy-enhanced IdM enablers (Chapter 23). To- 
gether, these chapters address the roles that networks, (or network architectures), 
devices, and services play for infrastructures considering the interest of the re- 
spective stakeholders. 


1.6.6 Part VII - Privacy Live! 


One of the main objectives of PrimeLife was to bring the tools we developed to use 
and to make end-users and other stakeholders aware what privacy enhancing mech- 
anisms can achieve. To this end, we have published a wide variety of code on the 
project’s website for download. Most of them are open source in the classical sense 
and all of them can be used for free and the source code and other documentation 
is available. Chapter 24 describes a selection of the tools we have published; for the 
complete list we refer to PrimeLife’s website.!! 


' http://www.primelife.eu/ 
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As a second way to make privacy live, PrimeLife has contributed to standardisa- 
tion organisations. The main focus here was on the relevant ISO working groups 
and the W3C-related bodies and our work is described in Chapter 25. 

As a third way to improve privacy in the real world, we considered how existing 
techonology is used for privacy- relevant data processing. In many cases, it neither 
matches the provisions of European data protection regulation nor does it address 
societys and individuals needs for maintaining privacy throughout a full lifetime. 
To address this, we elaborated on requirements and recommendations for all stake- 
holder groups involved. We present a selection of our best practice solutions that 
address different stakeholders in Chapter 26. 

Besides the results described in this book, we have further organised a number of 
summer schools. These schools had two facets. First, they brought together senior 
researchers and practitioners from many disciplines, most of them gave keynote 
talks on different aspects of privacy. Second, PhD students where invited to submit 
their research results and a selection of the research paper have been presented and 
discussed at the summer schools. The thereafter revised paper were published as 
proceedings [BDFHMH10]. 

Last but certainly not least, all PrimeLife partners have published extensively at 
scientific conferences. A selection of the results presented at these conferences and 
published in proceedings and journals are described in this book. For the complete 
publication list we (again) refer to PrimeLife’s website. 
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Part I 
Privacy in Life 


Introduction 


New information technologies are very popular and successful because individuals 
want to share personal and non-personal data with others where ever they are. How- 
ever, individuals in the Information Society want to safeguard their autonomy and 
retain control over their personal information, irrespective of their activities. This 
means that users of information technologies always have to be aware of which data 
they are producing explicitly and implicitly. They need to know who their audience 
is and they have to decide whether to trust the members of their audience or not. In- 
formation technologies need to transfer the measures for establishing trust we have 
in the offline world (e.g., usage of ID cards, certifications of rights, word-of-mouth, 
...) to the Information Society. This raises additional privacy risks for individuals as 
the information about them for establishing trust needs to be linkable to them. Here, 
a paradox related to online interaction becomes apparent. On the one hand, online 
interaction means that more traces are left and that individuals can be recognised 
easier, while, on the other hand, the physical distance between interacting parties 
makes it more difficult to define exactly with whom one is interacting. 

It was a first short-term goal of PrimeLife to provide scalable and configurable 
privacy and identity management in new and emerging Internet services and appli- 
cations such as virtual communities and Web 2.0 collaborative applications. This 
was achieved by setting up demonstrators showing how audience segregation for 
data can be done as documented in Chapter 2. Additionally, in Chapter 3, we de- 
scribe trust mechanisms to help users decide on trustworthiness in data delivered 
by others, with all users concurrently having privacy requirements on the data they 
request and deliver. The scenarios we chose for Chaptersr 2 and 3 cover a broad 
bandwidth of Web 2.0 applications (blogs, wikis, social networks, forums). The 
prototypes we built for these scenarios served as a basis for experiments for finding 
out which indicators raise users’ awareness with respect to data’s privacy and trust- 
worthiness. Unfortunately, within PrimeLife, it was not possible to make large-scale 
experiments with a large public. But most of the tools we built for these scenarios 
are also available as open-source for further deployment and usage. 

A second longer-term goal of PrimeLife as documented in Chapter 4 is to protect 
the privacy of individuals over their whole span of life. Each individual leaves a 
multitude of traces during a lifetime of digital interactions. The total of these traces 
forms a digital footprint. While part of these traces is unconsciously left behind 
and not meant as important information, a lot of information is very relevant and 
important for specific domains. This information, containing information about the 
individual, but also all sorts of documents, has to be accessible by someone at any 
time. This implies that, in case an individual is not able to manage this informa- 
tion, others need to obtain access. The inability to manage information can be either 
temporary, such as during the case of illness, or permanent, when the individual de- 
ceases. As an exemplary demonstrator, we built a backup tool that takes into account 
new ways of interacting. The tool provides for backup and synchronisation of doc- 
uments. With a view on the specific requirements concerning privacy and identity 
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management, mechanisms are built in to safeguard individuals, to improve control 
over the data and to allow for delegation. 


Chapter 2 
Privacy in Social Software 


Bibi van den Berg, Stefanie Potzsch, Ronald Leenes, Katrin Borcea-Pfitzmann, and 
Filipe Beato 


Abstract While using social software and interacting with others on the Internet, 
users share a lot of information about themselves. An important issue for these users 
is maintaining control over their own personal data and being aware to whom which 
data is disclosed. In this chapter, we present specific requirements and realised so- 
lutions to these problems for two different kinds of social software: social network 
sites and web forums. 


2.1 Scenarios and Requirements 


In recent years, a new generation of the Internet has emerged, also known as ’Web 
2.0’. One often quoted definition of Web 2.0 states that this term refers to a “set 
of economic, social, and technological trends, that collectively form the basis of 
the next generation of the Internet — a more mature, distinct medium characterised 
by user participation, openness, and network effects” [MSFO09]. Web 2.0 has four 
fundamental characteristics that set it apart from the first generation of the Internet 
(‘Web 1.0”): 


e Internet users have changed from passive consumers of information (searching 
for information and reading materials provided by others) into active creators 
of content [Tap09, How08, Lea08]. In Web 2.0, users can share their knowledge 
and information via a wide range of channels. Blogs, YouTube movies, wikis, 
file-sharing and consumer reviews are examples in case. 

e In Web 2.0, social interaction plays a central role. This is why Web 2.0 is also 
called ‘the social web’. 

e In many Web 2.0 environments, sharing and creating content and knowledge is 
not a solitary enterprise, but quite the reverse: the production and dissemination 
of information and entertainment services has a highly co-operative character. 
Participation and co-creation are key aspects of Web 2.0. 


J. Camenisch et al. (eds.), Privacy and Identity Management for Life, 33 
DOI 10.1007/978-3-642-20317-6_2, © Springer-Verlag Berlin Heidelberg 2011 
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e Web 2.0 also differs from the first generation of the Internet in a technical sense: 
technology developers now create applications that are embedded into the Inter- 
net, and are accessible via any browser. Thus, the Internet has become the central 
platform for users to access different types of software [O’ R07]. Moreover, soft- 
ware is offered to users as a service rather than as a product to buy separately. 


Since Web 2.0 is a new phenomenon — the term was first coined in 1999 but the 
massive take-off of this latest generation of the Internet is only a few years old — 
much is still to be learned with regards to both the benefits and the risks for users, 
businesses and governments in this new domain. Privacy issues relating to modern 
technologies have been high on the agenda of both government officials around 
the world, researchers, and the broader public, and for good measure, since it is 
obvious that the emergence of Web 2.0 currently generates a wide range of new 
issues relating to privacy and security. 

As said, the success of the social web is based on the active participation of users, 
and on their willingness to contribute to the creation and improvement of content 
on the Internet by sharing data and knowledge [SGL06, O’RO7]. By using social 
software, a lot of personal data is disclosed either directly — think of real names and 
birth dates on social networking sites — or indirectly, for instance through editing 
specific topics in a wiki, commenting on blog entries or posting statements in a fo- 
rum [GA05, EGH08]. Furthermore, personal data can be generated by establishing 
connections with, or disclosing information by, second parties with or without the 
consent of the respective person. While the possibilities of the social web may en- 
rich people’s lives on the one hand, there are also privacy risks involved. Five central 
privacy issues can be distinguished with respect to information and communication 
technologies in general, and Web 2.0 applications in particular: 


e When sharing data and knowledge in social software, users lack an overview of 
who has access to this information — they cannot adequately judge the size and 
makeup of their audience [PD03, Tuf08]. 

e Information and communication technologies enable anyone to collect, copy, link 
and distribute the (personal) data of others, thus allowing for the creation of ex- 
tensive profiles of individual persons. Information may also easily be copied out- 
side the original domain, thus making it even harder for users to know who has 
access to their information [Hou09]. 

e Information and communication technologies allow storage of data for a nearly 
indefinite time period, thus making it impossible to erase or forget this informa- 
tion [MSO09]. 

e Participatory information and communication technologies such as social soft- 
ware enable anyone to publish another individual’s personal data, which may 
have serious consequences for the other’s reputation [Sol07]. 

e Individuals’ lack of privacy-awareness when using social software may lead to 
information leaks and leaving unintended and/or unobserved virtual traces. 


To find out which guises privacy issues take in the new generation of the Inter- 
net, much research has been conducted with regards to privacy in social software in 
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recent years, and especially in social network sites. One of the central points with re- 
gards to information sharing and privacy in social network sites is aptly summarised 
by Acquisti and Gross. They write: *...one cannot help but marvel at the nature, 
amount, and detail of the personal information some users provide, and ponder how 
informed this information sharing is” [AGO6]. Individuals use social networking 
sites for two main goals: (1) to present themselves to others in a virtual domain, and 
(2) to engage in, manage and strengthen relationships. Both of these involve several 
privacy risks as Scenario 1 (Section 2.1.1) will show. 

A much less researched, yet highly relevant domain for privacy issues in Web 
2.0 is that of collaborative workspaces and forums. Users primarily use these en- 
vironments to create and share content, rather than to present themselves and build 
relationships. However, while participating in these workspaces, they may inadver- 
tently disclose information about themselves that can undermine their privacy as we 
will show in Scenario 2 (Section 2.1.2) 


2.1.1 Scenario 1: A Social Network Site 


Natalie Blanchard is a member of the biggest social network site in the world: Face- 
book. She has a profile page, detailing information about her person, her hobbies 
and preferences. Through Facebook, she stays in touch with a number of contacts, 
both close friends and acquaintances. Natalie knows that users must be careful about 
sharing their personal information in social network sites because of privacy issues, 
so she has changed the privacy settings of her profile to ‘visible to friends only’. 
This means that only the members of her contact list can see the information she 
posts there. Natalie regularly posts pictures to her profile page, for instance of a trip 
she took to the beach, or of parties that she attended with friends. 

One day in 2009, Natalie receives a message from her insurance company, telling 
her that they will terminate the monthly payments she has received for the last year 
and a half because she is on sick leave — she had been diagnosed with depression 
in 2007. Inquiries reveal that the insurance company had used Natalie’s Facebook 
page to investigate the validity of her ongoing claim for monthly payments, and had 
used the pictures of her, happy at the beach or laughing with friends, to conclude that 
Natalie was unjustly receiving these payments. It remains unclear how the insurance 
company gained access to the profile page, if Natalie had indeed shielded it from 
everyone who was not on her contact list. 

This scenario reveals that unintended audiences may sometimes access personal 
information in social network sites, and thus receive information that was not posted 
with them in mind. Based on what they see there, these unintended audiences may 
draw conclusions, and even undertake actions, that may harm the individual in- 
volved. Sharing information without having a clear grasp of the makeup and the 
extent of the audience, as is the case in social network sites, may thus have serious 
repercussions for users. 


36 B. van den Berg, S. Pétzsch, R. Leenes, K. Borcea-Pfitzmann, F. Beato 


2.1.2 Scenario 2: A Forum 


Hannes Obermaier works as a salesman in a big electronic company and his 
favourite hobbies are his family and gambling. He plays poker well and has even 
won a small amount of money in an online poker room. Unfortunately, Hannes for- 
got to indicate this earning in his tax declaration and therefore he has a problem 
with his tax office. Seeking for advice in this situation, Hannes finds a forum on 
the Internet where all kinds of questions related to online gambling are discussed. 
Hannes hopes to find help and creates a forum post in which he describes his prob- 
lem. After a few minutes, another forum user has written the first reply to Hannes 
post saying that he has experienced similar problems and asking about some more 
details of Hannes’ case. During the next few days, Hannes spends a lot of time in 
the forum. He has gotten to know a lot of the other users and with three of them he 
really feels like they have been friends for ages. Of course, Hannes has told them 
not only about his problem with the tax office, but he also shared some funny stories 
from his everyday life and posted a link to a cute picture of his son from his personal 
homepage. 

One day, a new user who calls herself WendyXY appears in the forum and starts 
to post insults and allegation about Hannes, not just once but repeatedly. The scary 
thing is that she seems to know Hannes’ name, where he lives and for which com- 
pany he works. Later, Hannes realises that WendyXY may have found his personal 
homepage since he had posted the link to the picture of his son. His personal home- 
page contained Hannes’ real name and his residence. Knowing this information, it 
must have been easy for WendyXY to infer where he works since Hannes has briefly 
mentioned his job in earlier posts and there is only one big electronics company in 
his local area. 

The story becomes even worse when one of Hannes’ major clients finds all the 
allegations about him on the Internet and cancels an important contract for fear of a 
negative image. 

This scenario may seem artificial, yet a similar case was reported by the German 
newspaper Zeit in 2009 [Bur09]. It illustrates that the sharing of personal data with 
possibly millions of unknown people on the Internet is a critical point from a privacy 
perspective and may result in negative consequences, such as bullying, cyberstalking 
or harassment. However, note that the sharing of personal data with an intended 
audience — in the scenario for example talking about the tax office problem with 
other online gamblers — is the main reason to use forums or similar collaborative 
workspaces. 


2.1.3 General Requirements 


In both scenarios, users’ privacy could be protected through the use of audience 
segregation [Gof59], i.e., the compartmentalisation of different social circles, so that 
individuals can show different sides of themselves in each of these circles, without 
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running the risk that information from other domains of their lives undermines the 
current self-presentation. In technical terms, one could argue that selective access 
control based on access control rules specified by the user is one feasible means to 
realise audience segregation. Due to the differences between social network sites 
and collaborative workspaces such as forums, different requirements emerge with 
respect to the way selective access control is implemented. 

In social network sites, users are directly connected to other members, whom 
they know (at least to some degree). This means that generating selective access 
control in social network sites refers mainly to the fact that users must be able to 
make information visible to specific contacts (yet not others). This can be realised 
by enabling users to create multiple profile pages in a social network site, and to 
cluster contacts in subgroups so that information disclosure becomes more targeted 
and precise. We will outline these principles in more detail in Section 2.2. 

In collaborative workspaces, such as forums, users are generally not connected 
to others, and they may not know any of the other members of the workspace. In 
these collaborative workspaces, privacy-enhancing selective access control can be 
realised based on general properties of the intended audience, without having to 
“know” each user in particular as we will show in Section 2.3. 


2.2 Two Prototypes for Privacy-Enhanced Social Networking 


2.2.1 Introduction 


Social network sites are one of the most flourishing branches of the social web. In 
2008, Facebook, the biggest social network site of all, had over 100 million users 
[Gri08]. In October 2010, the same social network had grown to more than 500 
million active users worldwide — ‘active users’ are defined by this network as indi- 
viduals who access the network every single day on average [Fac]. Users who have 
accessed the network only once, or use it less frequently, are not even counted in 
this number anymore, which means the total number of users must far exceed the 
phenomenal 500 million that Facebook focuses on. And Facebook is not the only 
social network site. There are literally thousands of different social network sites 
on the Internet. Roughly 200 of the biggest ones are listed on a Wikipedia page, 
which reveals that, on average, these sites gather hundreds of thousands of unique 
users and many go up to millions [Wik]. In March of 2009, Nielsen Online, an in- 
formation and media company that analyses online audiences and their behaviours, 
published a report that shows that the use of social network sites and blogs is now 
“the fourth most popular online activity, ahead of personal email” [BMO09]. 

While social network sites have become immensely popular worldwide, at the 
same time stories of privacy breaches on these sites are a regular topic in popular 
press. There is an abundance of examples of social network site users who have 
not managed to find a job [Man10], lost a job [Yok07], had to paid extra income 
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taxes [Ind08], or lost their right to health insurance payments [New09] because of 
information they had revealed through social network sites. Many empirical studies 
have been conducted in the previous years to reveal (a) how users perceive privacy 
on social network sites, and (b) how (mis)perceptions of privacy in relation to users’ 
own behaviour can lead to privacy violations in these domains. 


2.2.2 Privacy Issues in Social Network Sites 


One of the most fascinating aspects of users’ self-presentation in social network sites 
is the fact that they put such detailed and personal information about themselves in 
their profiles [Tuf08, YQH09]. In an article on the privacy risks for individuals using 
Facebook, Grimmelmann points out: 


“Facebook knows an immense amount about its users. A fully filled-out Facebook profile 
contains about 40 pieces of recognizably personal information, including name; birthday; 
political and religious views; online and offline contact information; sex, sexual preference 
and relationship status; favorite books, movies, and so on; educational and employment 
history; and, of course, picture. [...] Facebook then offers multiple tools for users to search 
out and add potential contacts. [...] By the time you’re done, Facebook has a reasonably 
comprehensive snapshot both of who you are and of who you know” [Gri08]. 


What’s more, “[a]// of this personal information, plus a user’s activities, are stored 
in essentially a huge database, where it can be analyzed, manipulated, systematized, 
formalized, classified and aggregated” [RG10]. When viewed from a European legal 
perspective on privacy, almost all of this information (details on sexuality, religion, 
politics etc.) is considered ’sensitive’ and hence requires “particular conditions and 
safeguards [...] when processed” [EBO9]. 

After an extensive analysis of articles on privacy issues in social network sites, 
we conclude that such privacy problems may arise in five categories with respect to 
users !. In some respects, these categories overlap with themes we have discussed 
in the introduction to this chapter, when discussing privacy issues in online worlds 
in general. However, on social network sites, these themes come together in a par- 
ticular way. We will discuss each of them in turn. 


2.2.2.1 Who is the Audience? 


In social network sites, “audiences are no longer circumscribed by physical space; 
they can be large, unknown and distant” [PD03]. It is difficult for users to know 
who exactly sees their information. The audience, to phrase it differently, is not 
transparent. On almost all social network sites, users can protect the visibility of the 


' Privacy issues caused by third party businesses and by the providers of the social network site 
themselves are another serious threat. These types of privacy issues were discussed in Deliverable 
1.2.5 of the PrimeLife project. We have chosen to focus on privacy issues amongst users here only, 
since those are the issues that we attempted to solve primarily in building our Clique demonstrator. 


2 Privacy in Social Software 39 


information they share through so-called ‘privacy-settings’. Generally, these enable 
them to set the visibility of their profile to one of four options: ‘visible to everyone’ 
(i.e., all users of the social network site), ‘visible to friends’ (1.e., visible to all the 
contacts listed in their contact list), ‘visible to friends-of-friends’ (i.e., visible to all 
the contacts in their contact list and to the contacts in the lists of all of these indi- 
viduals), or “visible to no one’. As it turns out, many users — aware of the possible 
privacy risks in social network sites because of all the media attention for these risks 
— set their profile visibility to ‘friends-of-friends’. This sounds restricted enough to 
be ‘safe’ from prying eyes, yet open enough to find new contacts, and to stay in 
touch with a larger social circle than one’s own strict group. Moreover, many users 
understand the phrase ‘friends-of-friends’ to refer to, roughly, the group of people 
one would encounter when going to a friend’s birthday party. However, this is a 
grave misperception. On average, users in Facebook have 130 friends in their con- 
tact list. Aside from the fact that this so-called collection of ‘friends’ must consist 
of more than real friends, simple multiplication reveals what setting visibility to 130 
friends-of friends means: when 130 friends of 130 friends are allowed to view an 
individual’s profile information, this means that almost 17.000 people have access 
to that information — a very large group of people indeed. A non-transparent audi- 
ence may easily lead to privacy problems, because users may unintentionally make 
information available to the wrong people, or to an unforeseen amount of people. 


2.2.2.2 Context Collision or Lack of Audience Segregation 


Another key element of the fact that users do not have complete control over, or full 
awareness of, who sees the information they post in a social network site is what 
Raynes-Goldie has called ‘context collision’ [RG10]. When she conducted research 
on the disclosure of information in Facebook, participants told her they were very 
frustrated by the fact that in this social network site (and in many others) all contacts 
are clustered into a single group, without distinction between the myriad of social 
relations and the various levels of intimacy one has with different contacts in real 
life. This leads to a 


“... flattened Friend hierarchy, where by default, everyone is given access to the same per- 
sonal information. As a result a user’s teetotaller boss sees the same things as their best 
friend, the party animal. This can cause problems when trying to decide what to share about 
yourself, or trying to manage how people from different life contexts might perceive [in- 
formation]. What is appropriate for a user’s friends to see may not be appropriate for their 
employer” [RG10]. 


Context collision entails that individuals are no longer able to meet the various 
behavioural requirements of the many different social settings in which they nor- 
mally operate, since one and the same audience sees all of their behaviours. Whereas 
individuals can keep various social settings separate in real life, for instance because 
these social settings are connected to distinct physical places (work, home, public 
space, etc.), in virtual worlds such as social network sites “intersections of multi- 
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ple physical and virtual spaces, each with potentially differing behavioral require- 
ments” may arise [PD03]. 

When phrased in the vocabulary of the twentieth century sociologist Erving Goff- 
man, whose perspective on identity and self-presentation was influential in our anal- 
ysis of social interaction in social network sites, what users in social network sites 
lack is a means for ‘audience segregation’ [Gof59]. Goffman emphasises the fact 
that human beings need such a segregation between audiences, since they perform 
different, possibly conflicting, roles throughout their everyday lives, and the im- 
pressions they aim to foster in each of these roles must not be contaminated by 
information from other performances. With segregated audiences for the presenta- 
tion of specific roles, performers can ‘maintain face’ before each of these audiences. 
In social network sites, the clustering of social relations into a single list of contacts 
defeats this important feature of social life in the everyday life. Context collision 
and context collapse in social network sites are caused by users’ lack of means for 
audience segregation. When the audience consists of individuals from many differ- 
ent contexts of an individuals’ life, brought together in one group to view all of the 
individuals’ behaviours in a social network site, then it is clear that this diminishes 
the individuals’ chances of protecting and maintaining his various ‘faces’. Thus, it 
may lead to minor or more serious privacy risks. 


2.2.2.3 Persistence of Information 


As we have seen above, the persistance of information in online worlds forms a 
threat to users’ privacy. As Palen and Dourish say, “the recordability and subse- 
quent persistence of information, especially that which was once ephemeral, means 
that audiences can exist not only in the present, but in the future as well—”’ [PD03]. 
This also applies to social network sites. Information posted on one’s social network 
site profile may be accessed by (known and unknown) individuals in years to come. 
Moreover, since information can be copied, saved and stored easily and indefinitely, 
information placed on one’s profile on a social network site at any particular mo- 
ment may come back to haunt the individual years down the line. This means that 
the audience is not only unlimited in terms of its size and makeup (in contrast to 
audiences in the physical world), but also in terms of temporality. 


2.2.2.4 Peer Surveillance, Snooping and Gossiping 


Users of social network sites can use the search functionality of these sites to find 
others’ profile pages, and may navigate through the profiles of others using the con- 
tact lists of individuals they have befriended. Depending on the visibility settings of 
each profile page, quite a significant amount of information about others may thus 
be gleaned. Navigating the profile pages of other users in this way possibly invites 
socially undesirable or even harmful behaviours. For one, gossiping and snooping 
are facilitated by it. As Hough points out, 
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“[tloday, technology enables us to gossip or otherwise exchange information with mil- 
lions of people instantaneously. [...] Social network sites such as Facebook, reunion.com, 
and classmates.com enable the resurrection of those embarrassing youthful escapades and 
awkward photographs we all wish would stay buried. Often, the postings are captured on 
camera-enabled cellphones without the knowledge of the subjects and uploaded without 
their consent, leading to instant notoriety, long-lasting embarrassment, and a loss of reserve 
that may never be recaptured” [Hou09]. 


Other researchers have pointed out that social network sites are breeding grounds 
for surveillance between peers. Adam Joinson writes that 


“social networking sites like Facebook may [...] serve a surveillance function, allowing 
users to ‘track the actions, beliefs and interests of the larger groups to which they belong’ 
[...]. The surveillance and ‘social search’ functions of Facebook may, in part, explain why so 
many Facebook users leave their privacy settings relatively open [...]. [Social network sites 
offer users a] ... unique affordance [...] [to] view other people’s social networks and friends 
[...]. This ability to find out more about one’s acquaintances through their social networks 
forms another important surveillance function” [Joi08]. 


Peer surveillance, snooping and nosing around may all lead to privacy issues for the 
parties subjected to them. 


2.2.2.5 Who Controls a User’s Information? 


On social network sites, users can create a user profile on which they can present 
themselves. This profile page is the starting point for setting up connections with 
other users within the same environment. On the profile page, users can choose what 
information to share about themselves and as we’ve explained above — to some ex- 
tent — who can view this information (i.e., everyone, friends only etc.). Therefore, in 
theory at least, users have some control over the image they create of themselves on 
their profile page. As research has shown, young people especially perceive them- 
selves as having a considerable degree of control over their disclosure of personal 
information online, and it turns out that they share such information in full aware- 
ness of the associated risks, because they have a high degree of confidence in their 
ability to manage potentially negative outcomes [BK09]. 

However, the control that users have over their own profile page and personal 
information in social network sites only goes so far. Other users can add or change 
information in a user’s personal profile, put pictures or information about him or 
her on their own or other people’s profiles, and tag pictures to reveal the identities 
of those portrayed in them. This can have serious consequences: placing a picture 
of another person online affects the image of that person to the audience viewing 
it, and hence may have an effect on the (current and future) self-presentations and 
impression management of that individual. 
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2.2.3 Clique: An Overview 


In our research, we have developed two demonstrators that may contribute to solving 
some of the privacy issues in social network sites that we have discussed above. The 
first demonstrator is a privacy-enhanced social network site called ‘Clique’, in which 
we have turned theoretical notions on audience segregation and context collision 
into a real-world online practice. The key solutions implemented in Clique are: 


e providing users with the option to create their own ‘collections’ in which social 
contacts can be clustered; 

e providing users with the option to create multiple ‘faces’ to mimic the practice 
of audience segregation in real life; 

e providing users with fine-grained options to define the accessibility of their post- 
ings and content in social network sites, thus mimicking the rich and varied tex- 
ture of relationships in real life. 


Clique was built using Elgg Open Source software for developing a social network 
site.? The key ideas developed in Clique are: 


e mimicking ‘audience segregation’ in a social network by (1) providing users with 
the option to create their own ‘collections’ in which social contacts can be clus- 
tered, and (2) providing them with the possibility to create multiple ‘faces’ to 
compartmentalise their online social visibility; 

e providing users with fine-grained options to define the accessibility of their post- 
ings and personal information in Clique. 


2.2.3.1 Audience Segregation in Clique 


The first principle we realised in Clique to enhance users’ ability to protect their 
privacy and to increase their options for adequate and realistic self-presentation was 
a translation of Goffman’s notion of ‘audience segregation’. This is implemented 
through two mechanisms: 


e Users can divide their list of contacts into ‘collections’, clusters of contacts that 
are socially meaningful, relevant and efficient for them. Each cluster contains 
one or more contacts; contacts may be listed in as many different collections as 
the users likes; 

e Users can create multiple profile pages for their account. We call these pages 
‘faces’. On each profile page, users can show different ‘sides’ of themselves, 
thereby mimicking audience segregation through physical distantiation (e.g., 
showing different sides of oneself at home than at work) in real life. 


? See http://www.elgg.com 

3 Note that Clique uses a separate social graph for each face to ensure that audiences are indeed 
kept separate. The social graph collects all of the contacts a user has in relation to this specific 
profile page, and designates the relationships between them. Collections are a means of assigning 
access rights to each of the nodes (i.e., contacts) in the social graph. 
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As we have explained above, on many social network sites, all contacts in a user’s 
network are lumped together into one category. No distinction is made between 
the different social spheres to which individuals belong in their everyday lives. This 
means that all contacts are exposed to the same information shared by the individual, 
regardless of how close they are to that individual. This issue has been solved in 
Clique through the use of ‘collections’. By allowing users to create ‘collections’ 
within their network of contacts, they can cluster social relations according to their 
own preferences, and thereby mimic the actual practice of building and maintaining 
separate social spheres in real life. We have gone to great lengths to ensure that 
users themselves are in control of the creation and labeling of collections. After all, 
they themselves know best what the fabric of their own social lives consists of and 
how it could be divided into relevant and meaningful categories. In Clique, users 
can choose (1) how many collections they wish to make (i.e., how granulated they 
want their audience control to be), and (2) which labels to use for each collection. 
However, to enhance user-friendliness we also provide them with a set of predefined 
collections, for instance ‘family’, ‘colleagues’, and ‘acquaintances’. 

Below are some screenshots that show the way in which the creation and man- 
agement of collections is implemented in Clique. Figure 2.1 shows the way in which 
users can add a new collection to their profile page. They begin by typing in a name 
for the collection, in this case ’My favourite colleagues’. Then individual contacts 
from the user’s contact list — that is, individuals that he or she has befriended before- 
hand — can be added to the collection by clicking through the alphabet and selecting 
those individuals the user wants to include in this collection. The letters of the al- 
phabet are bold if there is a contact whose user name starts with that letter, and grey 
if not. After selecting one or more contacts to add to the collection, the user can 
click ‘save’ and the collection is added to his profile. Figure 2.2 shows an overview 
of the collections this user has made and outlines how many contacts are in each 
collection. 

The second way of realising audience segregation revolves around the idea of 
creating ‘faces’, so that users can show different ‘sides’ of themselves to different 
audiences through the use of multiple profile pages. On most social network sites, 
users are allowed to create only one profile per person, and hence can create only 
one context in which all of their information is gathered. However, in real life, indi- 
viduals move from one context to the next throughout their everyday life — they go 
to work, visit friends, or spend time at home with their families. To solve the risk 
of ‘context collision’ many users currently maintain different profile pages on dif- 
ferent social network sites. For instance, they have a work-related page in LinkedIn 
and a page for family and friends in Facebook. This is a time-consuming and cum- 
bersome solution, since users have to log onto different platforms to manage their 
profiles and keep track of contacts and their activities in each domain. 

To solve this issue, we enable users to create multiple ‘faces’ in Clique to mimic 
the various social contexts in which they participate in real life (for instance, a work 
face, a private face, a face for the tennis club etc.). When a user accesses his profile 
in Clique, his various faces are visualised with separate tabs. By clicking on the tabs 
the users can access the page attached to that face. Figure 2.3 shows a screenshot 
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Fig. 2.1: Creating a new collection in Clique. 
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Fig. 2.2: Overview of a user’s collections. 


of the author’s faces in Clique. New faces can be added by clicking on the tab at 
the far right, which reads ‘+ Add a new face’. As the figure shows, existing faces 
can be enabled, temporarily disabled or removed entirely. Each of these faces has its 
own list of contacts and its own list of collections. Using tabs to distinguish between 
different faces is a visually appealing and easy way for the individual to manage his 
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or her own profile and the various faces contained therein. Information added to one 
of the tabs is invisible in all other tabs, and hence it is easy for the user to manage 
who sees what. 
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Fig. 2.3: ‘Faces’ in Clique. 


2.2.3.2 Control Over the Visibility of Individual Items of Information in 
Clique 


To further strengthen users’ capacity to control who sees which information in 
Clique, we’ve added two more features. First, each time a user posts an item of 
information (pictures, blog entries, messages), the system asks the user (1) in which 
‘face’ he wants to make it available, and (2) to which collections and/or individual 
users. Second, when disclosing personal information on their profile page, users are 
also asked, with each separate item of information (i.e., a telephone number, place of 
residence etc.) to make these choices. These measures further prevent information 
from spilling over from one context to the next or leaking to unintended audiences. 
While this feature is demanding on the part of users — it requires them to go through 
one extra step each time they want to share information with others in Clique -, 
it is explicitly designed to raise user awareness with respect to audiences and the 
disclosure of personal information. Moreover, we have designed the interface in a 
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user-friendly way, so that users can go through this process quite quickly* and in an 
intuitive way. Figure 2.4 shows the screen for setting visibility rights in Clique. 
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Fig. 2.4: Setting visibility rights to information items in Clique. 


2.2.4 Scramble!: An Overview 


Another demonstrator we have developed in the PrimeLife project to solve some 
of the issues surrounding privacy and security in self-presentation on social net- 
work sites (and also in other Web 2.0 environments), was an encryption tool called 
Scramble!. In previous years, several researchers have pointed to the need for access 
control through encryption in social network sites, like Facebook, MySpace or Twit- 
ter. Some of their ideas have been turned into tools such as Lockr> and FlyByNight 
[LB08]. However, these tools all have shortcomings. 

For instance, Lockr uses trusted third-party storage for the hidden information, 
which means that the user does not have full control over his own data, but rather 
has to place trust in a (commercial) third-party to guarantee a safer use of social net- 
work sites. The other tool, FlyByNight, only works on Facebook, and relies on the 
servers of the social network itself for encryption key management. The decryption 
algorithm is implemented in JavaScript and retrieved from Facebook. Consequently, 
while FlyByNight is browser-independent and portable, its biggest disadvantage is 


4 If a user decides to display the information to his ‘default’ collection, it is only one extra mouse 
click. If the user decides to share it with other collections and/or individuals, this still should not 
take more than a few seconds to accomplish. 


5 See http://www. lockr.org/ 


2 Privacy in Social Software 47 


that it is not secure against active attacks by the social network provider, Facebook. 
Both of these tools thus contain serious privacy risks. 

Scramble! — the tool we built — provides a solution for the attacker model limita- 
tions of Lockr and FlyByNight. It relies primarily on the user side and has no depen- 
dencies on any specific social network site, as in the case of FlyByNight. Moreover, 
it does not use a third-party to store information, as is the case in the Lockr project. 
Also, what our access control mechanism enables, and what all other existing ones 
lack, is selective access control. We will explain what this means below. Scramble! 
has the following key goals: 


e it enables users on social network sites to formulate which individuals have ac- 
cess to the content and personal information they place in, or attach to, their 
profile; 

e all of the information (both content and personal data) that users place online is 
encrypted and will only be visible to those contacts and/or collections that have 
the appropriate access rights. Individuals and collections with no access rights, 
cannot see the information, and nor can the social network site’s provider; 

e to aim for ease-of-use in order to strike the difficult balance between usability 
and privacy for general users. 


2.2.4.1 Selective Access Control in Scramble! 


With Scramble!, we have aimed to use cryptographic techniques to enforce access 
control. In this prototype application, we use an OpenPGP® standard [CDF*07] to 
keep social network users’ data confidential. One nice feature of OpenPGP is that 
it supports encrypting to multiple recipients using hybrid encryption, by encrypting 
the content with a random secret, and the secret with all the public keys of the set of 
users. We assume that each user holds a public and a secret OpenPGP key pair. 

For key distribution, we assume that users exchange their public keys when a 
friendship connection is established using an authenticated offline channel. Users 
can also make the public key available using the provider or any key server and 
distribute the fingerprint by the offline channel. In this way, users can verify the 
authenticity of the public key. Since Scramble! makes use of OpenPGP standard, it 
can build on the OpenPGP key management infrastructure, retrieving the key from 
an online key server by name or e-mail mapping. 

As an example of the flow, let Alice and Bob be two users on a social network 
site. Bob accepts Alice as his friend. He then adds Alice’s public key to his key ring, 
thereby including Alice in a circle of trust. Then, Bob can post encrypted messages 
that can only be accessed by a selective audience chosen from the Bob’s circle of 
trust, which now includes Alice. 

To realise selective access control on social network sites based on these princi- 
ples, we have built a Firefox extension with several features. First, there is the fact 


© OpenPGP is one of the world’s most widely used encryption standards. For more information, 
see http://www.openpgp.org/ 
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that access control is generated through the use of an encryption protocol (as said, 
based on OpenPGP), which enables users to trust that their content and personal 
information is invisible to anyone who does not have the right access key. Note that 
this includes not only other members of the social network site, but also the social 
network site provider. Second, the user himself can define the access rights handed 
out to various members of his own social circle, either to individuals or to collec- 
tions, or to both. The picture below shows two of the windows in which users can 
define access rights for particular items of content and for personal information. 
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Fig. 2.5: The Scramble! address book, from which selective access rights can be 
managed. 


Since Scramble! is a Firefox extension, it is not linked to a specific social net- 
work site, such as Facebook or MySpace, but works across platforms. We have done 
extensive testing with it in Facebook and MySpace, and also in our own social net- 
work site Clique. When a user has the plugin installed and switched on, each time he 
or she accesses or receives information posted by others, a check is run with regards 
to whether or not he or she has the proper access rights to read the information. If 
so, then the content is automatically decrypted and transparently displayed as nor- 
mal text. Otherwise, if the user has the plugin installed and switched on, but does 
not have access rights to the information, the information is concealed or instead 
replaced by gobbledygook. Figure 2.6 shows what this looks like in Facebook. 

Those who have not installed the plugin have no access to the information at 
all and instead see a so-called ‘tiny URL’ on their screen, a hyperlink referring to 
an address where the encrypted text is stored instead of the decrypted information 
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2 seconds ago - Comment - Like 


Calvin Hobbes | am getting happy... 
6 minutes ago - Comment - Like 


Fig. 2.6: Scramble! is switched on but the user does not have access rights. 


placed in the profile by the user. Figure 2.7 shows what this looks like in an earlier 


version of Clique. 
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Fig. 2.7: Scramble! is not installed: access is denied and the ‘tiny URL’ is displayed. 
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2.2.4.2 Scramble! and Web 2.0 


Since this encryption tool is built as a browser extension, it isn’t merely independent 
of the various social network sites, but could also be used in other kinds of Web 2.0 
environments, such as blogs, collaborative workspaces such as wikis, and forums. It 
could be used in any environment in which individuals fill in fields through which 
they exchange information. Using it is very simple. After installing the plug-in a 
user can select any kind of text in fill-in fields on Internet pages and simply choose 
‘Scramble!’, after which the content will only be readable for those who have the 
right access key. Moreover, because the extension is integrated into the browser the 
encrypted content will be decrypted automatically for all authorised users, without 
their intervention. The exchange of keys runs in the background and requires no 
particular skills from the owner of the information, nor from those gaining access 
to it. Our extension is simple and aims at striking the difficult balance between 
usability and privacy for general users. 


2.3 Privacy-Enhancing Selective Access Control for Forums 


2.3.1 Objectives 


User-generated content in forums may contain personal data in the sense of personal 
information, personal ideas, thoughts and personal feelings. In contrast to explicit 
profiles where, e.g., the date of birth is a specified data item saying “12 June 1979,” 
the same personal data can be stated in a forum post which is tagged with the date 
of writing and says “I got two concert tickets at my 30th birthday yesterday!” In the 
latter case, it may be not that immediately obvious to the user that she has disclosed 
her date of birth on the Internet. 

The disclosure of personal data in forums and other social software is critical 
from a privacy perspective, however from a social perspective, the disclosure is nec- 
essary since the exchange of information, both personal and non-personal, is the key 
feature of the application and the primary reason for people to use it. Hence, it is not 
our objective to prevent disclosure of personal data in forums. We rather want people 
to be aware of privacy and to enable them to more selectively specify to whom they 
disclose their data. Access control settings of currently available forums are once 
specified by the provider and cannot be changed by the particular user. Thus, the 
user can only decide to disclose information to the groups specified by the providers 
— in the worst case, this means disclosing to the public — or not to disclose anything 
at all. Since the first option is not preferable from privacy perspective and the sec- 
ond option is not preferable from social perspective, our objective is to develop a 
user-centred selective access control system for forums. Forum users should be able 
to protect their privacy through safeguarding their contextual integrity: data that is 
disclosed before an intended audience, should not spill over into other contexts and 
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hence possibly have damaging consequences. That is, a user who wants to share her 
happiness about her birthday present with some other users (e.g., other people going 
to the same concert) should be able to specify appropriate access control rules to her 
post in the forum. Besides the implementation of purely technical means, we further 
emphasise that it is necessary to sensitise users with respect to privacy in order to 
get a comprehensive solution. Therefore, we aim at raising awareness of the issue in 
users as another central goal in the direction of privacy-enhancing selective access 
control. 


2.3.2 Introducing phpBB Forum Software and PRIME Framework 


To demonstrate how an existing application can be extended with privacy-enhancing 
selective access control, we have chosen to build an extension for the popular forum 
software phpBB [php]. Thereby the main principles of the original system should be 
preserved. As in other forums, in phpBB, content is always structured in a specific 
way to make it easily accessible, easy to use, and searchable. In the following, we 
briefly explain the content structures of phpBB platform, which are illustrated in 
Figure 2.8. 


PNEBB siewrovne 


0 Forum tation 


Fig. 2.8: Overview of the hierarchy of a phpBB forum. 
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The top level resource is the forum itself, which is assigned a title and presented 
to the user when she first enters the platform. An administrator is responsible for 
managing all general issues of this forum. The forum is subdivided into topics that 
each address a different subject matter for discussion. For each topic, moderators are 
responsible for assuring compliance of the content with ethical quality and forum 
rules. Thus, they have the possibility to change subjects or content of posts, to lock 
or even to delete posts. Individual discussions focusing on particular aims are called 
threads. These are created with the submission of a starting post to which users can 
reply by submitting replying post. 

PhpBB software is available with so-called “copyleft license” and is developed 
and supported by an open source community. This means, original phpBB source 
code is available to the public and fairly well documented. With respect to the 
technical realisation of our selective access control extension, we rely on privacy- 
enhancing mechanisms that were previously developed in the European project 
PRIME [PRI]. More precisely, we used credentials and access control policies from 
the PRIME framework. 


2.3.3 Extending phpBB with Selective Access Control 


The most common approaches to access control include the access control list 
model, the role-based and the group-based access control approaches. All three re- 
quire a central instance that defines lists, roles, or groups based on user names, 1.e., 
identifiers of user accounts (cf. [PBP10]). However, social and collaborative inter- 
action in forums does not necessarily require an association of users by their names. 
Therefore, privacy-enhancing selective access control in forums requires mecha- 
nisms that are not based on the knowledge and existence of names or other par- 
ticular identity information. Furthermore, it is important that users themselves can 
decide what they deem to be sensitive or intimate information, rather than what is 
evaluated as such by lawyers, computer specialists or other third parties [Ada99]. 
This is why we argue that users themselves need to be given control over the audi- 
ence to whom they disclose data, and hence access control rules need to be set by 
the user, being the owner of a resource (e.g., a post), instead of by an administrative 
party. This implies that access control rules should be possible to specify not only 
for the whole forum or for topics, but also for threads and particular posts. We need 
to consider that forum platforms typically provide the roles “adminstrator’” for ad- 
dressing technical issues and “moderator” for content-related moderation of topics. 
Our approach should allow for keeping both roles. Hence, we can list the follow- 
ing specific requirements for privacy-enhancing selective access control in a forum 
whereby these are generalisable to further kinds of social software: 


e Other persons, who should or should not be able to access the personal data are 
not necessarily known by the user. 
e These other persons also have an interest to protect their privacy. 
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e No administrative party, but each user should be able to define and modify ac- 
cess rules to her contributions, i.e., personal information, expression of personal 
thoughts and feelings. 

e User-controlled and privacy-respecting access control can be applied to different 
levels of content granularity (e.g., forum, topic, thread, post). 

e An administrator of the forum should be able to address technical issues of the 
platform, but should not necessarily have access to content data. 

Moderators should be able to moderate particular topics. 
The owner of a resource is always able to have access on it. 


To address these points in our prototype, we let the user define access control 
policies together with her contribution indicating the properties a reader has to pos- 
sess and to prove. In order to protect the privacy also of readers, properties are 
presentable in an anonymous way and not linkable when repeatedly used. There- 
fore, we relied on the concept of anonymous credentials proposed by Chaum in 
1985 [Cha85] and technically realised in the Identity Mixer (short: Idemix) sys- 
tem [CLO1, CvH02]. The idea of access control based on anonymous creden- 
tials and access control policies is not new in general and was demonstrated in 
selected use cases for user - service provider - scenarios in the project PRIME 
([ACK* 10, HBPPOS5]). We built on the results of PRIME, transferred the ideas to 
social software and demonstrated the practical feasibility of maintaining existing 
concepts of phpBB platform and integrating privacy-enhancing functionality pro- 
vided by PRIME framework at the same time. 

Using anonymous credentials, everyone can prove the possession of one or 
more properties (e.g., being older than 18, having more than 10 forum posts 
with a 5 star rating) without revealing the concrete value (e.g., being exactly 56 
years old, having exactly 324 posts rated with 5 stars). In the prototype, creden- 
tials are also used to prove the possession of a particular role, which may be 
required by an access control policy. This implies that the process of creating 
a new resource includes that the originator of that resource receives the corre- 
sponding credential (cred: Owner-Thread-ID or cred: Owner-—Post-—ID) 
from the forum platform and stores it on the local device. The roles adminis- 
trator and moderator are realised with help of the credential-based access con- 
trol approach as well, i.e., the according credentials (cred: Admin-Forum and 
cred:Moderator-Topic—ID) are issued to the corresponding persons. To- 
gether with a new resource, default access control policies are created, which ensure 
that users who show the administrator credential or moderator credential get the re- 
quired access granted to fulfill their roles. The owner of a resource possessing the 
owner credential always has access to that resource and can modify the access con- 
trol policies to, e.g., allow other users with certain provable properties read access 
and maybe also write access to the resource. 

In general, credentials are offered by particular trustworthy organisations, so- 
called credential issuers. Credential issuers need to be known to the public, so that 
everybody has a chance to get credentials certifying properties of the user. More 
details on the technical implementation of the prototype can be found in [Pril0a, 
PrilOb]. 
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2.3.4 Scenario Revisited 


Returning to the forum scenario from Section 2.1.2, the following alternative story 
illustrates how access control based on credentials and access control policies in 
a web forum works. Assume Hannes posts a message to the thread “Online Gam- 
bling” in a publicly accessible forum. The access control policy of the thread is 
derived from the parent topic, which is set to be open for reading and writing exclu- 
sively for people who have proven to be registered to an online gambling website. 
Hannes additionally restricts access to his post to allow only gamblers who are reg- 
istered to their site for 3 months at least. 


Table 2.1: Example of an access control policy. 


(1) Forum: [(cred:Admin-Forum) OR (everybody/default])] AND 
(2) Topic: [(cred:Moderator-GamesCorner) OR (everybody/default])] AND 
[ 


(3) Thread: (cred:Moderator-GamesCorner) OR (cred:Owner-OnlineGambling) OR 
(cred:memberOfGamblingSite)] AND 


(4) Post: [(cred: Moderator-GamesCorner) OR (cred:Owner-PostFromHannes) OR 
(cred:countMonth-memberOfGamblingSite > 3)] 


Whenever someone requests access to Hannes’ post, the access control policy 
is evaluated according to the hierarchical order of content elements of the forum 
(cf. Table 2.1). In our example, step (1) ensures that authorised users are either an 
administrator of the forum or — since we have chosen a public forum for the example 
— any regular user. Step (2) specifies that users are allowed to read the topic “Games 
Corner” if they are a moderator of this topic or anybody else. The latter applies since 
the example does not specify any restriction on topic level either. Step (3) ensures 
that only users who are either moderator of the topic “Games Corner” or who are 
owner of the thread or who are member of a gambling website get read access to 
the thread “Online Gambling.” Lastly, step (4) determines that only users who are 
either moderator of the topic “Games Corner,” owner of the post, or member of a 
gambling website for at least 3 months can read the post created by Hannes. Note 
that read access to Hannes’ post is only granted if the whole policy (steps 1 — 4) is 
evaluated to be “true.” Similar to this example for read access, further policies can 
be defined in order to specify add, edit or delete rights of a resource. All users who 
add a post to a particular thread have the opportunity to further restrict access to 
their own contribution. 
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2.3.5 Privacy-Awareness Information 


Having described the realisation of the privacy-enhancing selective access control 
extension so far, in the following we introduce a feature that supports users to be 
aware of their privacy in the forum. More precisely, we developed a phpBB modi- 
fication (short: MOD) called “Personal Data” and integrated it into the forum pro- 
totype. The idea behind the Personal Data MOD is to provide additional privacy- 
related information in social software in order to raise users’ privacy awareness, 
help them to better assess their potential audience and eventually enable them to 
make informed decisions whether to disclose personal data on the Internet. The per- 
ception of privacy in social settings depends on the anonymity or identifiability of 
the users on the one hand, and on the available audience, i.e., who may read and 
reuse the disclosed personal data, on the other hand. Considering that privacy is only 
a secondary task for users, presented privacy-awareness information should be easy 
and quick to understand and not hinder social interactions and communication as 
primary tasks in social software. 

The Personal Data MOD contains two categories of privacy-related information: 


Audience Hints about who may access user-generated content, e.g., number and/or 
properties of potential readers. This partly compensates for missing social and 
context cues in computer-mediated communication [D6r08] and reminds users 
especially not to blind out the mass of “silent” forum readers. 

Identifiability Hints about potentially identifying data that is additionally avail- 
able, e.g., IP address or location information known to providers. This shows that 
users are not completely anonymous on the Internet and in particular in phpBB 
forums, but that there are identifiers available. 


In the prototype, the hint about the potential audience is coupled with the setting 
of the access control policies for read access. If no particular policy is specified for 
the corresponding forum element and the default policy of the upper-lying content 
element(s) states “allow everybody,” then the Personal Data MOD indicates “all 
Internet users” as the potential audience for this post (Firgure 2.9). However, if an 
access control policy is set which restricts the potential audience, the MOD makes 
users aware of this fact as illustrated in Figure 2.10. 


2.3.6 User Survey 


Besides working on the implementation of the prototype for privacy-enhancing, se- 
lective access control, we wanted to evaluate whether our approach meets real forum 
users’ needs. Therefore we conducted an online survey. The survey was available 
in German and consited of two parts: First, participants saw a realistic full-screen 
screenshot of the phpBB forum prototype with two posts in a discussion thread 
about leisure-time physical activity as shown in Figure 2.11. 
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Be Who can see my contributions? a Who can see my contributions? 
Posts visible for all Internet users. Posts visible for Internet users with certain provable properties. 
More infou. ‘More infou. 
© Forum index < topic © Forum index < topic 
Fig. 2.9: Screenshot prototype: Fig. 2.10: Screenshot prototype: 
Privacy-awareness information about Privacy-awareness information 
potential audience if access is not about potential audience if access is 
restricted. restricted. 


We instructed participants to imagine that they are the author of the first of the 
two contributions. An orange box on top contained either privacy-awareness hints 
or an advertisement. In the case that privacy-awareness hints were shown, partici- 
pants saw either textual information about the potential audience and their individual 
current location, or numerical information about the exact number of visitors of the 
forum within the last week and their individual IP address, or a combination of both. 
All participants of the survey were randomly assigned to one of the four groups. For 
the second part of the survey, all participants were shown the same online ques- 
tionnaire. The questionnaire contained questions about knowledge of technical- and 
privacy-related terms, use of the Internet in general and of forums in particular and 
questions related to audiences and access control. We also collected demographic 
data. A link to participate in the survey was distributed via blogs, mailing-lists and 
forums on the Internet. Due to this setup, we had a non-random sample as a basis 
for further analysis. After excluding answers from non-users of forums’ and from 
participants who had not seriously answered the questionnaire, 313 valid responses 
remain. In the following, we report selected relevant results based on those 313 par- 
ticipants. More details about methodology, analysis and further results are provided 
in [PWGI1O0]. 

First, to test participants’ knowledge and awareness of the potential audience, we 
asked them who actually has access to “their” forum post that they had seen previ- 
ously. Second, since we were also interested in participants’ ideas and requirements 
regarding access control, we further asked who they would intend to have access. 
Actually, the forum post that we showed to the participants was accessible for all 
people with access to the Internet, i. e., all registered and unregistered users, forum 
providers and Internet providers. The fact that the post from the example was com- 
pletely public could be learnt from the privacy-awareness display with the textual 
information (shown for Gz and G3) and there was also a visual cue visible for par- 
ticipants of all survey groups indicating that the post can be viewed without being 
logged in, i.e. it is visible for everybody with Internet access. 


7 Participants who stated in the questionnaire that they have never even read in a forum are consid- 
ered as non-users. 
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PRIVACY INFORMATION 


a Who can see my contributions? a Which additional information about me does the forum provider see? 
Posts visible for all Internet users (last week: 388 visitors in the forum). Currently you are in Niirnberg (Your IP address 212.34.184.10). 
More Infous ‘More info... 
‘© Board index « Sports & Cars oe 
QyFAQ Register © Login 


sports & allergy 
POSTREPLY « | (Q Search this topic.._| [Search] 2 posts * Page 1 of 1 


sports & allergy WY {fovore UlaLaunegsr 
Dby LilaLauneBir » Wed Apr 07, 2010 10:07 am 


| plan to do more sports in the near future as a balance. Due to my allergies (birch pollen, grasses,...) jogging in the park is not really an option. 
tim now condering to sign up for the fitness centre. Therefore my questions for you: Are you member of a fitness centre? How much is it and how 
often do you "really" go? 


Re: sports & allergy VY {fovere) —ussy 
Dby Lissy » Wed Apr 07, 2010 10:14 am 


am one of those people who are theoretically very high motivated, but then practically not going. It's 3 weeks since my last visit in the fitness 
studio... 


Display posts from previous: [All posts =I Sort by [Post time =) [Ascending ~] iGo] 


oneTReny v 2 naete e Pane 1 nf 1 


(a) Group G3 (audience, location, number of visitors and IP) 


PRIVACY INFORMATION 


SB ‘Who can see my contributions? -» Which additional information about me does the forum provider see? 


Posts visible for all Internet users. Currently you are in Nirnberg. 
More infous ‘More info... 


(b) Group G2 (audience and location) 


PRIVACY INFORMATION 


Who can see my contributions? » Which additional information about me does the forum provider see? 
Visitors in the forum last week: 388, Your IP address is 212.34.184.10. 
More infos ‘More info... 


(c) Group G; (number of visitors and IP) 


ADVERTISMENT 


We Where to get great music at small prices? & You want the complete Top Ten? 


‘Simply download from HitMusic. ‘Then get them from HitMusic for only 7,99 Euro. 
More info. More info... 


(d) Group Go (advertisment) 


Fig. 2.11: User interface for different survey groups (originally shown in German). 


A comparison of the percentages of expected access vs intended access of dif- 
ferent audience groups, listed in Table 2.2, reveals that nearly all participants know 
about and agree with the access to all post for registered members. Also nearly all 
participants know that the forum provider has access and three-quarters stated that 
the forum provider should have access. Our results further show that the majority 
of participants knows that also unregistered visitors can see the post, however only 
about one-third would want unregistered people to view their posts. Hence, there 
is a considerable difference between the percentage of participants who would let 
registered users read their posts and those who also would allow unregistered users 
access to their posts. This finding is interesting for two reasons: First, in most fo- 
rums on the Internet, anybody can easily become a registered member by providing 
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Table 2.2: Expected vs intended access to forum posts by different given audience 
groups. 


Go G; Go G3 all 

Audience groups n=78 n=74 n=86 n=75 n=313 
All registered users 

expected 96.15% 97.30% 95.35% 97.33 % | 96.49 % 

intended 89.74% 100.00% 96.51% 96.00 % | 95.53 % 
Unregistered users 

expected 69.23 % 70.27% 70.93% 78.67 % | 72.20 % 

intended 28.21% 31.08% 27.91% 36.00 % | 30.67 % 
Forum provider 

expected 98.72% 95.95% 95.35% 94.67% | 96.17 % 

intended 66.67% 75.68% 75.58% 70.67 % | 72.20 % 
Internet provider 

expected 47.44% 47.30% 52.33% 50.67% | 49.52 % 

intended 7.69% 10.81% 12.79% 12.00% | 10.86 % 


multiple choice questions with given answer categories 


a fake e-mail address and choosing a password. This means that practically each 
Internet user could have access with no great effort, anyway and from this point of 
view there exists no essential difference between registered and unregistered users. 
Second, the result indicates that participants want to differentiate who can access 
their forum posts and that their requirements do not match with current access con- 
trol settings which are defined by providers or administrators. Looking at the figures 
for the particular survey groups, we found no statistically significant differences be- 
tween them (neither with the usual significance level of p < 0.05 nor with p < 0.1). 
Besides deciding which of the four given audience groups is intended to have 
access to their forum posts, participants of the survey could specify other groups or 
share their thoughts about how access control should work in an additional free text 
field. Indeed, a dozen of the subjects took this opportunity to formulate ideas and 
said that they would like to authorise particular readers based on special properties 
or their relationship to them. A selection of comments, which underline real forum 
users’ needs for selective privacy-enhancing access control, is presented below. 


Selection of comments from participants to the question ” Who would you intend 
to access your forum contributions ?” (originally posted in German): 


C1: “persons I have chosen” 

C2: “authorised by me” 

C3: “circle of people that I have defined” 

C4: “members with at least 10 posts in the forum” 
C5: “friends” 

C6: “guests who I have invited” 
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These requirements can be addressed with the selective access control extension 
for phpBB forums. The extension enables users to define which properties someone 
has to possess — by showing either certified or self-issued credentials — for gaining 
access to users’ contributions. It is also conceivable to have credentials that prove an 
existing relationship, e.g., being friends in community X, and using these credentials 
for specifying access rules. Thereby it is possible to realise relationship-based access 
control with the selective access control extension without being restricted to it. 
As argued previously, access control based only on relationships is not suitable for 
forums in general since this requires that the author of a contribution and the users 
she wants to give access have to know each other before. This assumption does 
not hold for web forums, where people with similar interests can meet and discuss 
without knowing each other in person. 


2.4 Concluding Remarks 


In this chapter, we have investigated privacy issues and issues surrounding the man- 
agement and expression of identities in social software, with a focus on social net- 
working sites and web forums. The main difference between the two example ap- 
plications is that in the first case users already have a connection with each other 
whereas in the latter case potential interaction partners are not necessarily known to 
each other, e.g., by their user names. We presented appropriate concepts to address 
privacy-related issues for both types of social software. 

First, in order to enhance users’ privacy on social networking sites, our solu- 
tion enables them to cluster their social contacts in their own ‘collections’ and we 
provide users with the possibility to create multiple ‘faces’, i.e., they can maintain 
multiple profile pages for their account. By defining which profile page should be 
displayed to the members of a specific collection, users are able to segregate their 
audiences and actively control who can access their personal data. In case the user 
does not trust the provider, she can further use Scramble! to encrypt all content and 
share decryption keys only with members of the intended audience. A precondition 
here is that users can address each other in order to exchange keys. 

Second, we have presented the concept and realisation of a web forum prototype 
for privacy-enhancing selective access control. This solution enables active forum 
users to specify (a set of) properties which members of their potential audience need 
to possess. This means that even if the author and the audience are not known to each 
other, the author controls who can access her contributions to some extent. With 
Personal Data MOD, a feedback mechanism to support users’ privacy awareness 
is integrated in the forum prototype. A user survey complements our work and has 
shown that the ideas behind the prototype meet real forum users’ needs towards 
user-centred, selective access control. 

If the developed selective access control features are used in a very restrictive 
way, social software users will experience a high level of privacy but a low amount 
of interactions. Vice versa, if the access control is handled very openly users could 
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lose much of their privacy. Certainly, it would be inappropriate to stick strict access 
control policies for every contribution in a forum or to encrypt each single piece 
of information on a social networking profile. However, having the user-centred se- 
lective access control at their disposal may encourage users to discuss issues, which 
they would not address in public forums, or to state unpopular and uncensored opin- 
ions to a specified audience. Neither the privacy-enhanced social networking site 
nor the web forum with selective access control will completely solve the conflict 
between sociability and privacy in social software. However, both applications em- 
power users to find their individual balance on a case-by-cases basis. 

Considering providers’ point of view, it is important to note that privacy is a 
highly debated topic and providing social software with privacy-enhancing features 
can be a very good selling argument, especially with respect to people who refused 
to use social software so far because of privacy concerns. 
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Chapter 3 
Trustworthiness of Online Content 


Jan Camenisch, Sandra Steinbrecher, Ronald Leenes, Stefanie P6tzsch, Benjamin 
Kellermann, and Laura Klaming 


3.1 Introduction 


Some decades ago content on which people base important judgment used to be 
provided by relatively few, institutional sources like Encyclopedias. Since the 1990s 
the Internet has become an invaluable source of information for a growing number of 
people. While ten years ago web content has also only been provided by a limited 
number of institutions or individuals, today’s Web 2.0 technologies have enabled 
nearly every web user to act not only as consumer, but also as producer of content. 
User contribution is at the core of many services available on the Web and as such, 
is deeply built into those service architectures. Examples are wikis like Wikipedia, 
that are entirely based on content contributed by multiple users and modifiable at 
any time by any of them. 

Individuals and organizations increasingly depend on this distributed informa- 
tion, but they face severe trust limitations. In the Web 1.0, it was already difficult to 
decide to which extent online sources could be trusted. With Web 2.0 the question of 
trust in online content becomes even more important: Users cannot be sure whether 
an information is correct, whether the information will be accessible in the future, 
whether it is legal to use it, and who would assume liability in case the information is 
incorrect. Users of the Web are not protected against lies and misinformation - think 
of the recent cases of intentionally false articles in Wikipedia (e.g., BBCNews!), or 
stock price manipulations through misleading newsgroup postings (e.g., CNet”). 

In fact, with the highly dynamic information flow enabled by the Web, infor- 
mation often takes a life of its own as it can be, for example, published, edited, 
copied, aggregated, or syndicated; it eventually becomes detached from the context 
in which it was created and evolves separately. Users do not have cues to determine 
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whether they can trust the information or not. Personal ad-hoc strategies to deal 
with this, such as trusting only certain websites, are no longer appropriate when the 
information is dynamically assembled from multiple sources or is passed around, 
republished, or when the website itself does not perform editorial tasks but entirely 
relies on its users. 

In general ‘Trust’ is a difficult quality to define precisely due to the fact that 
there are often implicit qualifiers which are determined by the context in which the 
word is used and that the word is used to mean many different things (even within 
a single context). Definition is further hindered by the gap between ‘trusted’ and 
‘trustworthy’. When speaking about ‘trust in content’ we adopt a limited scope and 
take this to be ‘the belief that the information is true, accurate or at least as good as 
possible and the reliance upon this belief’. When speaking about ‘trustworthiness 
of content’ we mean that ‘the content satisfies a set of conditions (with implicit 
qualifiers) defined by another party to justify her well-founded trust in the content’. 

Trust in content is most often derived from trust in a person, entity, or process. As 
such, there needs to be a binding between content and the person, entity, or process 
from which trust may be derived. There are two standard approaches to address this 
binding. The first, more commonly used, consists of trusting the deliverer of the 
content. These include online news sites, corporate Web sites, and main entries in 
a blog. The second approach is to include meta-information along with the content 
that the user may use the assess properties of the content. Such meta-information 
includes digital signatures on the content itself or the links to external authoritative 
sources. 

The point of ‘trust in content’ is enabling consumers to assess (correctly) the 
trustworthiness of content. Such enabling involves a combination of technical mech- 
anisms, psychological insights, and user education. 

Work on technical mechanisms and findings of psychological insights derived by 
experiments are described within the remainder of this chapter. We first investigate 
the scenarios of wikis/blogs and derive requirements for technical mechanisms in 
Section 3.2. As users have both the wish that content can be trusted and the wish 
of protecting their own privacy a balance needs to be made between both require- 
ments. Based on this we come to technical mechanisms that do not attempt making 
fully automatic trust decisions on behalf of users, but instead present relevant (pri- 
mary and secondary) information to them in such a way that they can conveniently 
and efficiently interpret it as part of the still mental task of arriving at final trust 
decisions. This act of interpretation can then include whatever additional subjective 
considerations users wish to apply. Which consideration users apply was studied 
in experiments we present in Section 3.3. What these experiments basically have 
shown was that users need education to fully use the possibilities the Internet offers 
them to establish trust. Finally we present some technical solutions in Section 3.4 
that try to aid users in forming their human trust decisions; they do not replace or 
incapacitate them and should all come along with user education. 
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3.2 Scenarios and requirements 


We sketch for the real world scenarios of wikis and blogs how online content is 
used in different situations and which role the trustworthiness of content plays in 
these settings. This provides the starting point for exploring a set of mechanisms 
that allows for the realization of these and similar scenarios. From the mechanisms, 
a number of more detailed requirements can be derived that were used to build the 
prototypes documented later on in Section 3.4. 


3.2.1 Scenarios 


There are numerous possible scenarios available on the web that have a need for 
trusted content. Our choice fell on wikis and blogs as they are already very compre- 
hensive. 


3.2.1.1 Blog 


A blog is a sequence of generally short articles, often called entries, published on 
the Web. The entries are produced by an individual or collection of individuals who 
are the blogs author/s and are often connected by a theme. The entries may consist 
of text or multimedia, as in a video blog (vlog) or a photographic blog (photoblog). 
We interpret the term blogs in a relatively broad sense, i.e., not just encompass- 
ing individuals online journals, but all content that is ‘pushed’ over RSS or Atom 
protocols, and other similarly structured content (e.g., from electronic mailing lists) 
that is easily transformed into the common format. The popularity of blogs as a 
medium derives from the low cost of production and distribution of content. A di- 
rect consequence of this is a large base of content producers and of topics addressed. 
The issue of whether online information can be considered trustworthy is especially 
urgent when new information arrives that has to be acted on quickly such as blog 
articles that convey important news. Often these articles are published by an initially 
unfamiliar source of origination. Examples are: 


In-company weblog: Employees of a multinational company (or members of an- 
other organization) consume news from similar sources in the form of blogs. 
Not each individual participant may have the capacity to judge each piece of in- 
formation in its self-contained form (for a start, it may be phrased in a foreign 
language), yet the entire ‘crowd’ of members can form an enhanced overall view 
for ‘inside’ members on augmented ‘outside’ information. This scenario was in- 
vestigated with a demonstrator outlined in Section 3.4.1. 

(Medical) selfhelp: | Private individuals who consume health-related information 
(e.g., consider treatment options adjacent to interviews with their physicians), 
and have obvious warranted interest that this information be trustworthy (e.g., 
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‘Is it advertisement? Is it a rumor? What does my insurance company say about 
it? What is it?’). This scenario points to the importance of including privacy- 
friendly technology. Experience shows that individuals somewhat differ in their 
judgments as to the most desirable and practical levels of privacy [Bri98] based 
on cultural background, politics, age, and other factors; yet privacy is generally 
held an indisputable right and value when it comes to information that concerns 
personal health. We use this scenario to investigate which meta-information con- 
sumers concentrate on, when making trust decisions on health-related informa- 
tion as outlined in Section 3.3.1. 


3.2.1.2. Wiki 


A wiki is a collection of Web pages, which can be edited online by its users. The 
objective of a wiki is to support users in collaborative creating and editing of com- 
mon shared contents. It is possible to link content and to comment on it. Wikis also 
provide history functionality so that it is easily possible to reset pages to former ver- 
sions. While some wikis are accessible and editable without authentication, others 
can have fine-grained access control. A problem of information published in wiki 
systems is that information can easily be manipulated and be tampered with. An ad- 
vantage of a wiki system is that information can easily be corrected. The weakness 
of the system is therefore also its strength, at least if the user base is sufficiently 
large, committed and knowledgeable. To prevent misuse or vandalism, most wikis 
try to adopt the strategy of making damage easy to undo rather than attempting to 
prevent it in the first place. A widely known example for this is the history function 
with which one can restore pages to older versions. One of the major difficulties in 
wikis is that it is hard to establish whether information is reliable. The reliability 
of information depends on the author(s) of the content. Wikis may therefore adopt 
different mechanisms to control who can create and edit information. One mecha- 
nisms is that of captchas in conjunction with a text edit field. A captcha is a type 
of challenge-response test used to ensure that the response is not generated by a 
computer. Other mechanisms introduce a waiting period before an editor can con- 
tribute to a wiki which aims at preventing spur of the moment modifications. The 
English Wikipedia, for instance, requires new users to wait at least four days before 
they can contribute. This prevents, or at least delays, rogue automated programs to 
make edits. Another example is the Portuguese Wikipedia where a user has to make 
a certain number of edits to prove his trustworthiness and usefulness as an editor. 
The German version of Wikipedia is currently testing an extension called Flagged 
Revisions which lets trustworthy authors assign sighted or certified tags. The condi- 
tions for an author to be able to assign sighted tags are restrictive in days of having 
an active account as well as number of edits. The certified tag is work in progress. 

We investigate this scenario both with a prototype in Section 3.4.3 and a related 
experiment described in Section 3.3.2. 
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3.2.2 High-level mechanisms 


By providing consumers with a technological means for not only viewing the pri- 
mary information online but in the context of related assessments by others whom 
they are acquainted with, and who in turn may be better acquainted with the primary 
information, we can facilitate more educated trust decisions that are of benefit to 
consumers. Trust is ultimately a personal decision. Different individuals may make 
different choices even when presented with the same ‘objective’ evidence, and not 
everybody is able or even willing to express what exact considerations go into their 
respective trust decisions. 


3.2.2.1 Core mechanisms 


What technical mechanisms can do on a functional level to assist users is to ensure 
that a user that consumes meta-data can objectively know that it is related, who au- 
thored it (in an absolute or pseudonymous sense), and that it has not been tampered 
with. We can distinguish the following mechanisms on a functional level: 


Evaluating trustworthiness: This refers to the process of condensing all available 
meta-data (such as ratings) that belongs to a piece of information. It forms part of 
a process that ultimately leads to a binary trust decision on the information whose 
trustworthiness is under consideration. As an intermediary step, a numeric score 
for a piece of content may be calculated automatically, which users may then 
base their final judgement on. 

User reputations and certifications: The assessment of trusted content depends on 
who provided a piece of information (or who provided secondary information 
about it). User can collect so-called certifications and ratings that are aggre- 
gated to a reputation. User reputations serve to characterize users in ways that 
are meaningful to other users when it comes to judging them as sources (e.g., 
highly reliable source on a scale from | to 5). 

Binding metadata to data: The trust model assumes that when a user does not 
know whether to trust some piece of information, she can triangulate by tak- 
ing other, secondary information (meta-data, such as ratings) from other users 
into account. This entails mechanisms for strong bindings of information to its 
pursuant meta-data. 


3.2.2.2 Supportive means 


Beneath functional mechanisms users need supportive means to deal with the meta- 
data they got and to provide this meta-data to others: 


Privacy-friendly incentive system: Providing enough users with sufficient incen- 
tives for making their efforts on behalf of other users worthwhile is a known chal- 
lenge to collaborative schemes such as the intended solution. A privacy-friendly 
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incentive scheme supports one form on (stimulating) incentives that can help 
mitigate some related problems. 


Trust policies: These are artifacts that allow users to declare conditions on meta- 


data (such as ratings or scores) in order for the information to be regarded trust- 
worthy, or before information can be transformed during its life-cycle. 


Anonymous networks: Cryptographic schemes that rely on zero-knowledge proofs 


specify which information protocols must and must not carry such that partici- 
pants can communicate securely, yet do not inadvertently exchange redundant 
identifying information. In practice, this must be combined with the use of com- 
munication channels that do not expose their identities simply in the form of 
persistent network addresses or other path information. Anonymous networks 
serve for this purpose. 


3.2.3 Requirements of mechanisms 


In the following we make a first iteration of detailing the core mechanisms described 
above. From the supportive mechanisms we chose privacy friendly incentive system 
to do this iteration. 


A pre-requisite for all mechanisms are the following requirements: 


Open API: The system should offer external interfaces that respect relevant open 
standards for web and service interfaces, such that it be coupled to existing ap- 
plications in a straightforward manner. This the case of web applications this can 
have obvious advantages in terms of potential size of user community, remote 
access, etc. 

Efficiency: The system should employ an efficient representation of its (crypto- 
graphic) artifacts, both in terms of their in-memory representation and resulting 
requirements on surrounding protocols. 

Scope: The system should be applicable to a wide range of target applications, 
e.g., by using a decomposition into (a smaller group of) components that are 
specific to a (set of) application(s) and (a larger group of) general components 
that can serve any target application. 


3.2.3.1 User reputation and certification 


A user reputation and certification system comprises the following mechanisms: 


A user-rating mechanism: While our primary focus is content, users also can be 


ranked (providing them a reputation). This mechanism allows a party or process 
to specify the atomic rating of an individual or organisation (who/which pro- 
duced the content). 
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A rating aggregation algorithm: A rating algorithm aggregates individual ratings 
to one single reputation. It may allow weighing a set of individual object or entity 
ratings which would require weight factors to be specifiable/specified. 

Certification: | When entities rate content, the relying parties should be able to trust 
that the ratings are actually provided by the legitimate raters. Certification of the 
raters can warrant for this property. Certificates are basically digital signatures 
issued by a third party that is trusted by the user and that verifies that a public 
key is owned by a particular party. 

Web of Trust: This is a decentralized concept based on recommendation that is 
used to establish the authenticity of the binding between a public key and a user, 
for example, a PGP trust model. A network of trust can be generated using such 
a model. This can be contrasted with the centralized or hierarchical relationship 
between certification authorities that exists in X.509. 

A mechanism to propagate ratings: Ratings are propagated over some kind of rat- 
ing network. A mechanism which models this network and the message ex- 
changed is needed for rating systems. 

A mechanism to store reputation: There are different ways to store reputations, 
e.g., it may be stored decentrally on user side or centrally at a reputation server. 


These mechanisms have to meet the following requirements: 


Authentication of parties: | Users want to both demonstrate that they can be trusted 
and also ensure that the parties they deal with are trustworthy. 

Completeness of reputation: Users want the aggregated reputation to consider all 
ratings given. During the storage and propagation of reputation it should not be 
possible for the entities involved to omit certain ratings. 

Pseudonymity of authors and raters: | Users want to rate and provide web content 
under a pseudonym to not necessarily allow others to link this rating to their real 
name. In the real world there are also authors who write under a pseudonym and 
many services in the Internet also allow the use of pseudonyms instead of real 
names following EC Directive 95/46 [95/46/EC]. 

Anonymity of users: Users want to evaluate reputation anonymously to prevent 
others from building personal behavior profiles of their possible interests. 

Persistence of reputation: The same reputation should be available for all pseudo- 
nyms a user uses in a context. 

Self-determination of shown reputation: If there exist only few authors with the 
same reputation these authors are easily linkable despite of using different pseu- 
donyms because of the same reputation value. Thus, authors should get the pos- 
sibility to determine how much of their positive reputation they show. Negative 
reputation must not be omitted. 

Transparency: The reputation algorithm must be able to show how an aggregated 
rating was derived on the basis of individual ratings. The system has to be de- 
signed in a way, that the user may check the integrity of single ratings as well as 
the integrity of the reputation. 
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3.2.3.2 Binding Metadata to Data 


Secure metadata support comprises the following mechanisms: 


A mechanism for combining a piece of data and its metadata in a secure manner: 
This mechanism ensures that content and the meta data remain associated, that is 
that it is impossible to tamper with the content without this being reflected by the 
meta-data. This can, for instance be achieved by forming signature of the whole. 

A mechanism for checking that metadata belongs to its data: © This mechanism al- 
lows the relying party to check whether the metadata actually concerns the data 
to which it is purportedly associated. This could be accomplished by offering the 
relying party a mechanism for checking the signature of the combined bundle. 

A mechanism for reliably referring to single instances of a piece of data: 


These mechanisms have to meet the following requirements: 


Integrity: The system must ensure that the combined bundle of data and its meta- 
data is safe from unauthorized modification. 

Non-repudiation: The system must ensure that the effective combination of data 
and its metadata cannot be denied by a user who created a signature on the bun- 
dle. This requirement may conflict with the requirement of authors being able to 
contribute and rate pseudonymously in the system. The harmonisation of these 
two requirements requires special attention. 

Normalization: The system should be able to normalize both data and meta-data 
to account for the fact that (semantically) equivalent forms may be represented 
by different byte strings, yet should lead to same signature values. 

Transparency: The mechanism for reliably referring to single instances of a piece 
of data should respect existing conventions for data references in general. (This 
can, e.g., be achieved by forming URLs under a new schema.) 


3.2.3.3 Evaluating Trustworthiness (or any other property of content) 


A trust evaluation system for content comprises the following mechanisms: 


A mechanism to request content to be evaluated: This mechanism allows a user 
to specify that certain content needs to be evaluated in terms of trustworthiness, 
integrity, validity, relevance, etc. The requester may associate a reward or incen- 
tive to the fulfillment of the request. The incentives or rewards may be specified 
in terms of privacy-friendly incentive points (see supportive measures). 

A rating mechanism: This mechanism allows a party or process to specify the 
atomic rating of particular content (i.e., the content-rating). The rating may be 
based on the entity-reputation of an individual or organisation who/which pro- 
duced the content, on certain qualities of the content (content-rating) as assessed 
by the rater or the rating process (in the case of e.g., text analysis). 

An aggregation algorithm: A content rating aggregation algorithm aggregates in- 
dividual ratings to one single content quality rating. It may allow weighting of 
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single ratings based on a set of individual content ratings which would require 
weight factors to be specifiable/specified. 

A mechanism to propagate ratings: Ratings are propagated over some kind of rat- 
ing network. A mechanism which models this network and the message ex- 
changed is needed for rating systems. 

A mechanism to store ratings: Most likely the content ratings are stored on the 
content server. 


Similarly to the user reputation and certification system these mechanisms have 
to meet the following requirements: 


Availability of reputation and ratings: As a functional requirement, each user of 
the rating system should be able to access reputations and ratings to estimate the 
quality of web content. 

Integrity of web content and ratings: Users want web content, ratings and reputa- 
tion to be preserved from manipulations, both in propagation and in storage. 

Accountability of authors and raters: | Users want a content’s authors and raters to 
be accountable for the web content they provided respectively rated. This re- 
quirement may conflict with the requirement of authors being able to contribute 
and rate pseudonymously in the system. 

Completeness of reputation: (same as in user reputation) 

Pseudonymity of raters: (same as in user reputation) 

Unlinkability of ratings and web content: Users want to rate and provide different 
web content without being linkable. Otherwise behavior profiles of pseudonyms 
(e.g., time and frequency of web site visits, valuation of and interest in specific 
items) could be built. If the pseudonym can be linked to a real name the profile 
can be related to this real name as well. 

Anonymity of users: (same as in user reputation) 

Confidentiality of ratings: Although a reputation system’s functional requirement 
is to collect and provide information about a reputation object, raters might prefer 
to provide only a subset of their ratings to a specific group of other users while 
keeping it confidential to all others. 

Liveliness: The system may allow existing content ratings to be replaced by novel 
ratings. This may even be required on the basis of new information, for instance 
when a rater turns out to have provided unwarranted ratings. 


3.2.3.4 Privacy-Friendly Incentive System 


A suitable privacy-friendly incentive system comprises the following mechanisms: 


Obtaining privacy-friendly incentive points for circulation: This mechanism allows 
users to obtain a collection of privacy-friendly incentive points from a reserve. 
Points in this collection will effectively enter circulation, and the reserve will 
enforce an overall policy on the flow of incentive points (e.g., maximum issued 
number linked to monetary equivalents in users’ accounts). 


70 J. Camenisch, S. Steinbrecher, R. Leenes, S. Pétzsch, B. Kellermann, L. Klaming 


Exchanging privacy-friendly incentive points: This allows a party to offer incen- 
tive points for certain online transaction and to transfer those points to another 
party once a transaction has occurred. 

Removing privacy-friendly incentive points from circulation: This allows parties 
to return privacy-incentive points to a reserve. Such points will be withdrawn 
from circulation, and the submitting user will typically receive some suitable 
form of other compensation (e.g., monetary deposit to her account). 


These mechanisms have to meet the following requirements: 


Pseudonymity: Users must be able to offer and receive privacy-friendly incentives 
under pseudonyms and without the need to reveal their real identities. 

Double-spending: The system must be able to detect when users try to cheat by 
spending the same privacy- friendly incentive points on multiple parallel occa- 
sions (i.e., they overspend). Double spending must lead to certain disciplining 
behavior, such as revealing the users identity to warn against future misuse. 

Accountability: It must be possible to hold parties accountable the actions taken 
within the scopes of defined mechanisms. For instance, this must be true with re- 
gard to the exchange of pending privacy-friendly incentive points, or with regard 
to disciplining users because of their alleged double-spending. 

Unlinkability: |The system must ensure that uses of different privacy-friendly in- 
centive points remain unlinkable, as long as they spending them responsibly (i.e., 
do not overspend). 

Off-line: The system SHOULD support off-line use of privacy-friendly incentive 
points, i.e., two users can exchange such points without a central party (typically 
the reserve who issued points in the first place) having to become involved. Es- 
pecially in an off-line scenario it has to be ensured, that double-spending is not 
possible. 

Distribution of concerns: The incentive system should allow parties to store their 
digital artifacts (e.g., privacy- friendly incentive points) locally, and should not 
introduce unnecessary assumptions for central storage or other processing at a 
single location. In case of local storage of the digital artifacts, loss of these arti- 
facts is a concern. Should the system be capable of re- issuing lost credits? 


3.3 Experiments 


In the previous section requirements and mechanisms were sketched that may be 
used to help internet users assess the trustworthiness of online content. Before we 
describe in Section 3.4 how these mechanisms can be implemented technically we 
evaluate additional requirements from practical user experiments. 

The first experiment we describe in Section 3.3.1 relates to ‘Binding metadata 
to data’ (Section 3.2.3.2). We want to know which metadata is useful to function 
as trust markers. Although, as we mentioned earlier, trust ultimately is a personal 
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decision, there are of course patterns and some data are more relevant trust makers 
than other. 

The second experiment we describe in Section 3.3.2 relates to ‘User reputation 
and certification’ (Section 3.2.3.1). Our goal was to find out how private users con- 
sider their reputation and other attributes to be to. Based on this we can suggest how 
to make a trade-off between the metadata other users want to know about content 
and the trust information others are willing to reveal. 


3.3.1 Binding metadata to data 


The first experiment aimed at better understanding the criteria employed by internet 
users in order to determine which information to trust and which not to trust. 

In order to learn more about internet users’ mental trust models and what people 
consider to be relevant cues regarding content quality and trustworthiness, and how 
content evaluators handle rating content, we have conducted a few experiments. 
Questions guiding this research were: 


e What are relevant properties to be rated? What are the most salient features of 
content to call it trustworthy (e.g., validity, accuracy, completeness)? Should the 
quality be associated to the object-quality score (like author reputation is con- 
fined to the domain at hand), or will this be unmanageable by end-users (raters 
and readers)? 

What are relevant author properties to be rated? 

What binding is required between content and author or rater? Math proofs pro- 
vided by math professors are likely valued higher than those provided by math 
students, but this does not say anything about the professor’s reputation regarding 
fast cars. 


3.3.1.1 Findings 


Research on credibility assessment of information found online generally demon- 
strates that factors pertaining to characteristics of the content, i.e., usefulness and 
accuracy of the information, factors pertaining to authority, i.e., trustworthiness of 
the source, as well as factors pertaining to the presentation of the information play 
key roles in people’s credibility assessments [Met07, Rie02, EK02, FSD*03, FCO1]. 
However, there appears to be a discrepancy between indicators people believe they 
use in order to appraise information they find online and indicators they actually 
use when assessing the credibility of information [EK02]. Internet users typically 
indicate that their judgements of website credibility is affected by the identity of 
the source and scientific references. However, results from observational studies 
demonstrated that people rarely pay attention to these factors and generally spent 
little time evaluating the quality of information [EK02, FM00, FSD*03]. Instead, it 
seems that the presentation of information and the design of websites are the most 
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important determinants of internet users’ credibility assessments. On the one hand 
this finding might be somewhat distressing because people might be easily misled 
by appealing webdesigns and consequently trust information that may be of lit- 
tle value and low quality. On the other hand, and from a privacy protection point 
of view, the finding that information about the source has little impact on internet 
users’ credibility assessments implies that information about the identity of the au- 
thor does not need to be disclosed on websites. If internet users pay little attention 
to features of the source of the information they read on a website, the absence of 
this indicator will not interfere with their credibility assessments. Consequently, in- 
formation lacking author specifications will not be regarded as less credible. This 
assumption is counterintuitive, especially when keeping in mind that people believe 
that information about the source is the primary factor influencing their credibility 
assessments. To explore this assumption in more detail, we have conducted an ex- 
periment in which we tested which indicators determine whether or not people find 
information trustworthy and what role author specifications have in this process. 


3.3.1.2 Experiment and questionnaire 


The experiment and questionnaire were designed to explore which indicators peo- 
ple view as credible when searching information on a wiki. The test subjects were 
presented with a mock-up of a medical wiki that had to be used to answer ques- 
tions about an unfamiliar topic (5-HT or Serotonine). The test subjects could enter 
search terms in the wiki in order to find information to answer the questions. After 
entering a search term into the wiki, participants received a list of search results in 
a random sequence. The list consisted of six search results, each providing a few 
random words about 5-HT along with information about the author of the text (trust 
indicators). Each of the results bore one of the following trust markers: (1) the name 
of the author, (2) the title and name of the author, (3) the occupational title of the 
author, (4) the title and reputation of the author, (5) a certification of the author, 
and (6) a reputation measure for the website. Participants could then choose one 
of the hits and received a text that contained information they could use to answer 
all questions about 5-HT; the text was slightly different for every hit but included 
the exact same information. Each text was associated to one indicator, i.e., for each 
subject the text presented for a given indicator (such as name of the author) was al- 
ways the same. After having received the search list generated by the wiki, subjects 
could select a hit from this list and read it and then return to the search results list 
to choose other hits. In addition, they were free to enter as many new search queries 
as they wanted. Each time participants returned to the list with search results or 
entered a new search query, they received four questions concerning the expertise 
and trustworthiness of the source and three questions referring to the quality of 
the information using 10-point Likert scales with items: competence, knowledge- 
ability, objectivity, truthfulness, accuracy, usefulness, and comprehensiveness of the 
author). The procedure therefore was: select hit, read text, answer the 7 information 


3 Trustworthiness of Online Content 73 


credibility questions and return to the search results list to repeat the procedure, or 
answer the 3 questions concerning 5-HT. 

After submitting their answers to the 5-HT questions, subjects received a ques- 
tionnaire about search strategies. The purpose of this questionnaire was twofold. 
First, one question about reasons for selecting one or more additional hits during 
the task was integrated into the questionnaire as an additional credibility measure. 
Second, the questionnaire was designed to generally measure people’s strategies for 
searching information on the internet. 

The test subjects in the experiment (256 students at Tilburg University, TU Dres- 
den, National University Ireland Galway, and ETH Zurich, resulting in 172 useable 
response sets) appear to favour the first search result in the search results list, irre- 
spective of the source of the information. The findings of the experiment demon- 
strate that internet users’ credibility assessments are mainly influenced by the po- 
sition of information in a search list. The position of a hit in a search list was the 
most important indicator followed by information about the occupational title of 
the source. Personal information about the identity of the author was not a particu- 
larly relevant indicator for trustworthiness, at least not when compared to position 
in the search list and occupational title of the author. Personal information about 
the author, such as his name, became a more important indicator as people selected 
more than three hits from the search list. Information about the occupation or rep- 
utation of the author are more relevant than his or her name. In addition to these 
indicators, a reputation measure of a website was found to influence people’s credi- 
bility assessments, whereas a certification such as the one used in the present study 
(author is a member of the American Medical Association), does not seem to be a 
valuable indicator for credibility. When looking more closely at what the subjects 
say about the quality of the information it appears that the test subjects believe that 
information that is provided along with the occupational title of the author has a 
higher quality than information that is provided along with a certification of the au- 
thor regardless of the actual quality of the information. It also appears that position 
of a hit in a search list generated by a search engine is the most important indicator 
for its trustworthiness for people’s first search, whereas indicators providing infor- 
mation about the source become more important than the position for the subjects’ 
subsequent searches. The main reason for participants to visit more than one search 
result was to look for confirmation of the information already read on the first entry. 
While these findings demonstrate that people have a strong tendency to rely on the 
position of a hit in a search list, they indicated that in general professional writing, 
message relevance and the absence of typos were the most important indicators for 
trustworthiness of information they found online. In line with the findings, partici- 
pants indicated that in general information about the source was of relatively little 
importance in terms of credibility. Actually, presence of contact information, author 
identification and author qualifications and credentials were rated as least important 
indicators for reliability on information found online. Interestingly, only 17.9% of 
the participants indicated that the position of a search result in the list was very im- 
portant. When comparing the actual behaviour of the subjects with their beliefs, it 
becomes clear that people believe the position of a hit in a search list is less impor- 
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tant for their decision to select a hit than it actually is. This discrepancy between 
indicators people believe they use in order to appraise online information and indi- 
cators they actually use when assessing the credibility of information is in line with 
previous research findings. However, in contrast to previous findings, participants 
did not indicate that they found information about the author very important, while 
this information did affect their actual behaviour, albeit to a lesser extent than the 
position of a hit. Taken together, the findings demonstrate that our test subjects be- 
lieve they base their decisions whether to choose a hit and rely on the information 
they find on the internet on characteristics of the content, while actually, conve- 
nience, i.e., the position of a hit in the search engine output, mainly determines their 
behaviour. 


3.3.2 User Reputation and Certification 


According to [Ada99], it is more important that what is deemed sensitive or per- 
sonal data is based on the perception of the individual rather than if the data can be 
evaluated by third parties (e.g., lawyers, computer specialists). Considering that in- 
dividuals often claim to have a great need for privacy but behave differently (cf. pri- 
vacy paradox [P6t09]), we decided to conduct a study with an experimental part to 
learn how users actually treat their own reputation value compared to other personal 
data items. In the following, we briefly outline the set up of the study and report key 
results. This experiment is also published as an outcome of PrimeLife in [KPS11]. 


3.3.2.1 Study Design 


The web-based study consisted of an experiment and a questionnaire. Invitations 
to participate in the study were posted in several forums and blogs on the Internet 
and we also distributed flyers in the university library. All participants who com- 
pleted the study were offered the chance to win vouchers for an online shop. For the 
experiment, all participants were asked to rate the same articles from a wiki about 
books and literature according to three given categories. Before participants actu- 
ally accessed the wiki articles, they did a short literature quiz. By answering four 
multiple choice questions about famous writers and books, they received between 
zero and four points. These points are considered as a subject’s reputation. Subjects 
were further asked to indicate name, age and place of residence. When rating the 
wiki articles subsequently, each participant decides whether her 


name, 
age, 

place of residence and/or 
reputation 


should be published together with her rating of a wiki article. 
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Half of the participants were randomly assigned to the experimental group. The 
experimental group were shown privacy-awareness information, i.e., information 
about who can see what data about the user, together with each wiki article. The 
other half of the subjects belonged to the control group and did not receive privacy- 
awareness information. 

After finishing this first part of the study, all participants filled in the question- 
naire. In this questionnaire, we asked about the perceived level of privacy in the 
wiki, about experience with wikis, ratings systems and the Internet in general. We 
used questions from the applied privacy concerns and protection scale [Tad10] to 
investigate general caution, privacy concerns and applied protection measures. Fi- 
nally, we asked about demographic data and whether subjects had given their real 
name, place of residence and age at the beginning. 

We calculated the Perceived Privacy Index (PPX)* from participants’ answers to 
the questions about how public, private, anonymous and identifiable they felt in the 
wiki. Each item was measured on a 0 to 100% slider scale. The higher the PPX 
value, the more private a subject felt. 


3.3.2.2 Results 


After excluding complete data sets from a few subjects who admitted not to having 
seriously participated in the study, 186 valid responses remain and were used for 
further analysis. 

30 % of the subjects agreed to publish their real name together with the rating 
of a wiki article. The disclosure of their real age was okay for 57 %, real place of 
residence for 55 % and 63 % agreed to have their reputation value published. This 
means, for each data item there was a considerable share of subjects who wished 
to keep this information private. If participants indicated later in the questionnaire 
that they did not provide true information in one of the first three categories, we 
treated this data item as not disclosed. Since the reputation value was calculated 
from answers in the literature quiz, lying was impossible. 

Further, we used a linear regression model to calculate how the disclosure of 
these four data items and a few other factors influenced user’s perceived privacy in 
the wiki. The results are listed in Table 3.1 and reveal that there are only two factors 
that significantly decreased the perceived privacy: the fact that a user has published 
her name and the fact that a user has published her reputation value. While it is 
not surprising that a user feels less private after disclosing her real name, we found 
that also disclosing their reputation value had a similar effect on perceived privacy. 
According to the results, the reputation value is deemed an even more sensitive 
piece of data than age or place of residence. Application-independent measures, 


3 The questionnaire contained the question “Please indicate to which extent the following ad- 
jectives describe your feelings while using the wiki: 0 % (not at all) — [adjective]— 100 % (very 
much)?” (originally asked in German). The PPX is composed of the adjectives “public” (scale 


39 66 


inverted), “private”, ““anonymous’”,“‘identifiable” (scale inverted). 
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i.e., privacy concerns, general caution and technical protection measures, did not 
play a significant role for perceived privacy in the wiki. 


Table 3.1: Regression model, n=186. 


Perceived Privacy Index PPX (dependent var.) Est. Std.er P 
Intercept 288.93 33.47 
Application-specific predictors 
Privacy-awareness information available 4.57 12.01 0.704 
Name published —46.66 14.49 0.002** 
Age published —13.54 16.77 0.420 
Place of residence published —21.65 16.06 0.179 
Reputation value published —39.99 14.04 0.005** 
General predictors 
Privacy concerns -1.35 1.10 0.223 
General caution 0.22 1.79 0.902 
Technical protection —0.47 1.47. 0.750 


sign. levels: *** p < 0.001, ** p < 0.01, * p < 0.05 


Altogether, the results of our study underline that a user’s reputation value has 
to be treated as a personal data item. That means that in a reputation system, users 
should have the possibility to keep their reputation private, or to disclose only an 
approximated value. 


3.4 Demonstrators 


In the following we give brief summaries about the demonstrators we built corre- 
sponding to the mechanisms described in Section 3.2.2. The demonstrators can be 
used for either the wiki or blog scenario outlined in Section 3.2.1. 


3.4.1 Trustworthy Blogging 


Blogs are a representative type of Internet content that is continuously updated by 
many individuals and organizations. Updates occur by adding new time-stamped 
articles. For instance, news headlines and reports from news organizations are now 
commonly available as blogs, and numerous individuals are maintaining what re- 
semble online diaries in the form of blogs. We interpret the term blogs in a relatively 
broad sense, i.e., not just encompassing individuals online journals, but all content 
that is “pushed” over RSS or Atom protocols, and other similarly structured con- 
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tent (e.g., from electronic mailing lists) that is easily transformed into the common 
format. 

The issue of whether online information can be considered trustworthy is espe- 
cially urgent when new information arrives that has to be acted on quickly. This may 
well be the case with blog articles that can convey important political, economic, or 
other news 


3.4.1.1 Main Idea 


Information found on blogs is often commented upon by other users. The main 
idea of the demonstrator is to present such comments to users when they read a 
blog article. That is, the users not only see the blog article but their browser also 
presents them comments on this article by other users and information about these 
other users. Thus, by also reading this secondary information she can better assess 
the trustworthiness of the blog article. This is of course requires that the Internet 
is searched for such comments and (reputation) information about the users how 
provided theses comments is available. 

The demonstrator implements this idea on the IBM corporate intranet. As here 
there exists a central directory that provides information about each employee, the 
demonstrator focuses on finding and indexing comments and making them available 
to the users. In particular, the issue on how the deal with the reputation of comment- 
ing users is not considered. The demonstrator further offers users the possibility 
provide their own comments on articles they had read. 


3.4.1.2 Demonstrator Architecture and Mechanisms 


The demonstrator consists of a central server collecting and indexing comments and 
of components that display information to the users. The latter include a firefox- 
plugin that apart from displaying a web site, e.g., a blog entry, also displays the 
related meta-information such as comments and identity information about the com- 
menting users. It also offers reader means to provide their own comments on a read 
blog entry to the demonstrator server. 

The central server provides two main functionalities: It is first a service which 
readers can query for meta-information w.r.t. a piece of information, i.e., a blog arti- 
cle they are reading. The meta-information include comments on a blog the demon- 
strator has found by crawling the net as well as comments and annotations that 
readers submit once they have read an article. The second functionality is to collect 
these meta-information and to maintain in index. 

Most of the mechanisms needed for the implementations are rather straightfor- 
ward and we not describe it here. The main technical challenge was to find a mech- 
anism to bind information (e.g., an article) to its metadata (comments but also all 
other information about the article such as its author or source). Thereby we can 
in general not assume that pieces of information (text) have a unique identifier 
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such as a URL. To solve this, we have introduced the concept of a bound URI 
(BURI). A BURI is computed from a duple consisting of information and its meta- 
data. The computation of a BURI involves normalizing the information to give it 
a unique representation, versioning of it, and then binding the information and the 
meta-information together (e.g., by a digital signature of the originator of the meta- 
information. 


3.4.1.3 Learnings 


User generally find the presentation of the collected meta-information very helpful 
to assess blog articles. However, the motivation to offer comments themselves seems 
to be rather low. To address this, we have developed a privacy friendly incentive 
system that is described in the next section. 

We finally note that a central server that provide user with meta-information can 
potentially track which articles which users read. This could be alleviated by having 
the users to request this service via an anonymous networks. An alternative method 
is to use mechanisms that hide the query from the server. Also, if the providers of 
a blog would mirror the related meta-information, this problem would not occur to 
start with. 


3.4.2 Encouraging Comments with Incentives 


User-generated content often varies in quality and accuracy. Its trustworthiness as 
well as its quality can be significantly improved by (expert) reviews and comments. 
As most scientists know, good reviews are time-consuming, that is, come at a cost. 
Even though community service out of idealism is a common trait for instance in the 
Wikipedia community, incentive systems can improve the situation for contributors 
as well as for the contributed content. They aim at reimbursing the review or revision 
cost by awards, and at invigorating the review process. 

Privacy-friendly incentives complement this fundamental goal with anonymity 
and privacy protection for all users. Therefore, they enable a double-blind peer re- 
view process and nurture fairness, impartiality, and rigor. Authors as well as the 
reviewers of documents can remain anonymous during the entire review process. 
Such a blind review process is believed to be essential for high (academic) quality 
and honest comments, even though it sometimes lacks in reviewer accountability. 

Our goal is to establish a cryptographic system that reaches high quality stan- 
dards, while fulfilling the diverse requirements of the involved parties. 

We formalize the incentive system as a collaborative document editing system, 
in which all revisions, reviews and comments are linked to one initial document 
Po. We consider a document version history P = {P),...P,} as ordered sequence of 
revisions, reviews and comments associated with the Py, where P,, denotes the most 
recent revision or review. 
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3.4.2.1 Principals 


There are multiple parties interacting with a document P. We have a clearing house 
that hosts all documents and organizes the incentive system, in our case the wiki W 
component. The wiki has a community of users and each user U may act in different 
and multiple roles: 


Reader U: A reader consumes a document P. Any reader may offer incentives to 
other users to improve the quality of a document by a review or a revision. 

Author V: An author contributes an initial version or a revision of a document P. 

Reviewer R: A reviewer contributes reviews and comments for a document P in 
exchange for receiving an incentive. 

Editor E: An editor is a privileged user, who may approve or decline document 
revisions or reviews by authors and reviewers. 


We introduce a bank B to exchange electronic incentives for real-world goods and 
awards. Users of wiki W can withdraw fresh incentive e-coins and deposit spent 
ones as part of our virtual incentive economy. Even though we allow a system with 
full anonymity, we require each user to register with a trusted identity issuer | to 
infuse accountability in the entire review and incentive process. Each user U ob- 
tains an identity certificate Oy on its identity sky from issuer |. Our system works 
with multiple banks as well as multiple identity issuers, we focus on the single- 
bank/single-issuer case for simplicity. The identity of an honest user is never re- 
vealed by the incentive system, whereas the certified identity enforces separation of 
duties between authors and reviewers, and prevents double-spending attacks as well 
as vandalism. 


3.4.2.2 Concepts 


In a privacy-friendly incentive system, many anonymous users interact with a single 
document P. Incentives may be given before or after a contribution (revision or 
review). Pre-contribution incentives are offered to users to provide a contribution 
at all and it is independent from the contribution quality. For instance, a reader U 
can offer incentive e-coins for any reviewer R who is willing to contribute a review. 
Post-contribution incentives are offered after the contribution is made and may be 
dependent on the quality of the contribution. For instance, users can rate the quality 
of reviewer’s contribution and offer reputation e-coins for his work. 

In our model, a reader U explicitly withdraws incentives from a bank B. The 
reader U offers these pre-contribution incentives on the wiki W for improvements 
on a document P. The wiki W acts as a clearing house and it is responsible for 
ensuring unlinkability by exchanging the spent incentives of reader U with bank B 
for fresh incentives. Once a reviewer R decides to contribute a review P’, he sub- 
mits the review to the wiki W for inspection by an editor E. Once the editor E 
approves the review, the reviewer R can obtain the incentives from the wiki W. As 
post-contribution incentives extension, the number of obtained incentives can be de- 
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pendent on the review rating or the reviewer can obtain separate reputation e-coins 
to build a reputation credential. 


3.4.2.3 Checks and Balances 


The privacy-friendly incentive system provides anonymity to all users and balances 
this property with strong accountability safe-guards. In a fully anonymous system 
without such safe-guards, malicious users could attempt to manipulate reviews, 
sabotage other author’s work or publish fabrications without accountability. Well 
known examples of checks and balances to counter those attacks are the separa- 
tion of reviewer and author/editor, or the binding of reviews and documents to the 
contributor’s true identity. 

To achieve accountability as well as separation of duties between roles, we intro- 
duce a cryptographic domain pseudonym Npy for each user U that interacts with a 
document P. It is a function of the user’s true identity sky and the document P while 
hiding sky computationally. Therefore, each entity interacting with document P has 
one unique pseudonym, which is independent from entity’s role. Pseudonyms Npy 
and Nov created for different documents P and Q are unlinkable. 


3.4.3 Author reputation system and trust evaluation of content in 
MediaWiki 


3.4.3.1 Architecture 


MediaWiki, the software used by Wikipedia, probably is the most used wiki- 
software. Therefore, the implementation of an author reputation system was done 
as an extension for this application. In the the following we outline how two of the 
core mechanisms from Section 3.2.2 can be implemented for the wiki scenario from 
Section 3.2.1. The requirements and design for this prototype are also published as 
a result of PrimeLife in [KPS11]. 


User reputations and certifications: For the credibility of authors, an author repu- 
tation system assigns author reputation to authors. This is done initially by using 
certifications users got outside the system (e.g., a master degree to show ex- 
pertise in computer science) and transferring them to a reputation value in the 
author reputation system. Our reputation system allows to set up different fields 
of expertise and users can have different pseudonyms and different reputations 
in these fields. We make use of the identity management system developed by 
the PRIME project* (PRIME) for assisting the user in showing pseudonyms and 
certifications. For showing reputation PRIME was extended. After registering a 
user’s reputation is influenced by the ratings other users give to the content he 


4 www. prime-project.eu 
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creates within the wiki system. Our reputation system uses the natural numbers 
(including 0) as set of possible reputation values. As users manage their repu- 
tation on their own, one is able to omit single ratings. To avoid, that users omit 
negative values, our system uses only positive ratings. Therefore it is a disadvan- 
tage, to omit any value. 

Evaluating trustworthiness: A content rating system allows readers of content to 
judge on it’s quality and give a rating to it. The content rating systems collects all 
ratings given, aggregates them to a reputation of the content and shows it together 
with the content to possible future readers. The rating a user gives to the content 
influences the aggregation algorithm depending on the reputation the rater shows 
about himself. The problem with wikis is that information changes frequently. 
The reputation extension is derived from the ReaderFeedback extension for Me- 
diaWiki.> Using a wiki as implementation platform brought in additional issues 
like several authors of a content and that there exist different versions of con- 
tent that do not all get rated. Our content rating system makes use of 5 stars as 
possible ratings a content might get. 


This means that our overall system consists of the following parts: 


e the user-side with the PRIME version allowing for reputations installed, 
e PRIME certification authorities for issuing credentials/certifications, 
e the wiki server with the PrimeLife-ReaderFeedback-extension. 


3.4.3.2 Functionality 


In the following we describe the basic functionality of the system: 


Fetching Initial Reputation 


Authors may start work with an initial reputation. That means, that proofs of compe- 
tence certified by an authorized institution can be brought in the work with the wiki 
by using certifications that have been given to the user. This is done by showing 
anonymous credentials with PRIME to the wiki server. From this, a certain reputa- 
tion value an author has is calculated by the wiki. 


Passive Usage 


When browsing a wiki page, which has been rated with the help of reputation ex- 
tension, the user will see the reputation of the content of this page in form of one to 
five stars. 

The reputation shown may not be the reputation calculated for to the latest revi- 
sion of a page. This is due to the fact, that there may be no ratings given to the latest 


Shttp://www.mediawiki.org/wiki/Extension:ReaderFeedback 
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revision which is necessary for the calculation of the reputation value. However, if 
no ratings have been given to the latest revision, the most recent calculatable repu- 
tation value will be displayed. If a user wants to know more details of a reputation 
value the history of single ratings from which the reputation value was calculated 
can be shown as well (Fig. 3.1). 


141.76.46.77 141.76.46.54 


Revision Author @@GQGQ S444 
13:47, 14 December 2009 (diff) 141.76.46.14 Jxkk Xk KT 
14:20, 9 December 2009 (diff) (many) KrYek kk 
16:56, 4 December 2009 (diff) (many) KKK KKKKT 


15:42, 4 December 2009 Tipe deere 


15:40, 4 December 2009 (diff) 141.76.46.14 


13:41, 20 November 2009 (many) Kk wk k 


Fig. 3.1: Interface showing the Reputation and Rating History. 


The tabular representation contains much information in one view. The raters 
reputation is shown on top of the table below the name or IP address of the rater. The 
different icons represent the type of reputation which was shown (e.g., the syringes 
represent a certain reputation in medical field). The stars below the raters are the 
ratings, which were given to a single revision of the page. If an author indicated his 
reputation together with submitting an edited page, this reputation is shown beneath 
the authors name or IP address. 


Editing Wiki Pages 


When editing a page, a user is asked if he wants to send his reputation value. This 
reputation value is needed to calculate the reputation of the page afterwards. The 
higher the reputation value of the author is, the more impact it will have on the 
reputation value of the page. For showing reputation a user shows a credential. We 
make use of credentials that allow greater-than-proofs to allow an author to decide 
about the amount of reputation he reveals depending on his wish for anonymity. 
e.g., when having a reputation value of 63, an author may prove that he has a value 
greater than 20 or greater than 50. The higher a reputation value is, the more impact 
it will have on the reputation value of the page but as the set of authors shrinks when 
increasing the reputation value, the anonymity-set of the author shrinks as well. 

As every user has to decide on this trade-off on his own, a so called “Send Per- 
sonal Data Dialogue” asks the user for his reputation value and tries to display the 
trade-off in a graphical way. This dialogue is shown in Fig. 3.2a. 
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Additionally, the type of the reputation is important for the calculation. While the 
topic of a page does not change, the author may have several reputation credentials. 
e.g., a surgeon may edit a page, the content of which is about gynecology. The repu- 
tation credential on surgery may have more impact on the gynecology article than a 
reputation credential dedicated to dentistry. However, having a credential on gyne- 
cology would have the most impact. An author may not only show a credential from 
his concrete reputation type. Within the issuing process he obtains a more general 
value automatically (e.g., while issuing a credential about gynecology, one obtains a 
credential about medicine and a general reputation credential as well). When asked 
for his credential, the user may decide if he shows the specific credential (which has 
more impact on the page-reputation) or if he uses the more general one (with the 
benefit, that the anonymity set is higher). 

In addition to his reputation value and type, the user may send some identifier. 
This gives him the possibility to benefit w.r.t. the increase of his reputation value, 
whenever other raters give a high rating to the page. However, giving an identifier 
makes the user linkable of course. The decision about sending the identifier is done 
with a checkbox shown in Fig. 3.2a on the bottom. 

The identifier has to be shown again, when the user wants to fetch a reputation- 
update later. Fig. 3.2b shows this dialogue. 


.4 PRIME - Send Personal Data e 
Data 10 TUD for améhing 6 
You can attach your reputation to the edit you make. 
If you attach your reputation, the page content will appear more credible 
‘Please select the type of reputation: : . . 
® General € PRIME - Send Personal Data J 
& Medicine a Ras tae Sees sani] 
Internist (maximum value: 23) — Data to TUD for anything oO. 
Computer science (maximum value! 33) 5 | 
High impact on average article rating, To update your reputation, you have to give 


low anonymity 
@ your old reputation 
@ and your identifier (to find new ratings for your edits). 
Please select the type of reputation: = 
&- General 
- 10 &- Computer science 
Cryptography (maximum value: 37) 


Low impact on average article rating, 
= Ohigh anonymity 


(| Send my identifier 


|_ Do not send Seng [ Donotsend |] { send | 
(a) Editing a page (b) Updating the Reputation 


Fig. 3.2: Customized Send Personal Data dialogue. 


Rating Pages 


In addition to editing, users can actively influence the reputation of a page value 
by rating it. Therefore, a rating form is shown to the user on the bottom of each 
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page. With this form a user can give 1-5 stars in four criteria of a page (currently 
reliability, completeness, neutrality, presentation). 

Similar to editing, the user is asked for his reputation value, when he wants to 
submit his rating. A dialogue similar to the one shown in Fig. 3.2a is shown to the 
user when he wants to submit his rating. As also stated in the last section, several 
properties of the reputation value of the rater influences the impact of the page rep- 
utation as well as the anonymity of the rater. Therefore, raters have the same choice 
authors have between a large anonymity set or a high impact of his rating to the 
reputation value of the page. 


3.4.3.3 Lessons learned 


The prototype built provides a comprehensive framework for creating author repu- 
tation and content evaluation in a wiki by considering the comprehensive and partly 
contradicting requirements such a system has. This concept could be applied also to 
other applications or cross applications, especially all applications that are PRIME- 
enabled. But our first user evaluation has shown that the overall framework is too 
comprehensive for end users to understand in a first hand and make use of all fea- 
tures, especially the privacy features. 


3.5 Conclusive Remarks 


On the one hand the need for establishing trust in online content is obvious. We 
could show in our experiments, that the kind of meta information users want to es- 
tablish trust differs and many users are not aware of which indicators they actually 
use to establish trust as indicated by their behavior. On the other hand according to 
our experiments users actively contributing to the the trustworthiness of online con- 
tent by giving ratings to content have privacy concerns about their user reputation 
as our second experiment has shown. 

We presented a set of functional mechanisms with requirements which help to 
establish trust in content, namely ‘User reputation and certification, ‘Binding Meta- 
data to Data, and ‘Evaluating trustworthiness’ which help to establish trust in con- 
tent. Additionally we gave ‘privacy-friendly incentives’ as supportive means with 
requirements. These mechanisms were implemented as prototypes for two investi- 
gated scenarios, either wikis or blogs. 
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4.1 Challenges and Requirements 


Much research and development has been done during the past couple of years to as- 
sist users in managing their partial identities in the digital world by several types of 
identity management [BMH05]. A comprehensive privacy-enhancing identity man- 
agement system would include the following components [CKO1]: 


e an Identity Manager (IdM) on the user’s side; 
e IdM support in applications (e.g., at content providers, web shops, etc.); 
e various third-party services (e.g., certification authorities, identity providers). 


However, current concepts for identity management systems implicitly focus on 
the present (including the near future and recent past) only. The sensitivity of many 
identity attributes and the need to protect them throughout a human being’s entire 
lifespan is currently not dealt with. The digital lifespan is the range of time from the 
emergence of the first information that is related to the human being until the point 
in time when no more personal data is generated: from the moment of birth until 
death. Hence, lifespan refers to the temporary aspects of privacy and identity man- 
agement and, in particular, to the challenges involved in realising (privacy-related) 
protection goals over very long periods of time. The area of challenges regarding 
privacy is vast —- even when not considering an entire lifespan (see, e.g., [ENI08]). 
In the following, we describe which additional problems occur concerning lifelong 
protection of individuals concerning their privacy in a technology-based society. 


4.1.1 Dealing with Dynamics 


Our society, as well as the individuals that form them, underly dynamics. We dis- 
tinguish between dynamics in the surroundings of the individual and dynamics in 
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the individual’s ability or willingness of managing her private sphere on her own as 
outlined in the following subsections [CHP* 09]. 


4.1.1.1 Dynamics in the surroundings of the individual 


The dynamics of the effects from the outside world — possibly affecting the indi- 
vidual’s private sphere — comprise, among others, technological developments, re- 
placement of administration, changes in law and policies, and — last but not least — 
the evolvement of society. 

The least dynamics we have to deal with is the increasing processing of personal 
data during one’s lifetime. This involves the disclosure of personal data to many 
data controllers, partially because the disclosure and processing of data is officially 
required (e.g., because of school attendance, tax liability), partially because the data 
are needed to fulfil tasks in the areas of e-commerce, leisure, communication etc. 
Fig. 4.1 shows a simplified model of increasing data disclosure to different data 
controllers, depicted by coloured stripes. The lighter colours on the right-hand side 
express that the personal data are not needed anymore for the task to be fulfilled, but 
the data may still live on in official archives, at Internet services, or in the storage 
of communication partners [MS09]. The data might not be deleted after the time of 
death nor after the funeral. 

The coloured stripes in Fig. 4.1 might also correspond to several partial identities 
[HPS08], for example (but not exclusively) individuals in different areas of their 
life. Areas of life are sufficiently distinct domains of social interactions that fulfil a 
particular purpose (for the data subject) or function (for society). Formal areas of 
life include education, work, and health care. Informal areas of life cover mainly a 
user’s social network including family, friends, leisure, religion etc. Some of these 
informal areas might become formal by institutionalisation, e.g., for religion in the 
form of membership in a church. 

Another dynamic results from the development in technology including possible 
risks. The technological progress of the last decades triggers the transformation of 
our society towards a computerised social community highly dependent on informa- 
tion. The more structures of our society depend on information, the more important 
is the role that data plays in our everyday lives. During decades of technological 
evolution, several methods were invented for storing data in various forms and on 
various types of media. However, the processes and events in the nature, society, 
and in the life of the data subject cause failures, which might lead to loss of the 
data during the lifetime of the data subject. Even if unwanted data loss might not 
be a common phenomenon encountered within the life of every data subject, it may 
become an evident and serious problem, which emerges in the lifelong extent of 
time. 

Privacy becomes an increasing problem, e.g., unauthorised access to personal 
data which enables attackers to read them, link them with other data, or modify 
them. Personal data, which are assumed to be secure at a specific point in time, may 
be at risk after some time if no additional precautions have been taken. 
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Fig. 4.1: Accumulation of personal data and formation of partial identities. 


Also, in the political and societal area, dynamics are to be expected during a 
period of several decades. In particular, it is not predictable how people in a leading 
role will interpret today’s personal data or which policies will apply. 

Lifelong privacy mechanisms need to cover not only the near past and future of 
an individual, but also need to consider the future prospects for a human’s lifetime 
and also beyond, which means about 90 years. The problem is that we only have 
experience with computer-based technology for about half of this time, and with the 
growing exchange of computer data over the Internet, even less than a quarter of this 
time. This means that all privacy mechanisms (including public-key cryptography 
invented in 1976 based on cryptographic assumptions) could not be tested in practice 
for a whole lifetime yet. For the selection of privacy technology (hardware and 
software), attention should be paid to the following aspects: 


e The duration of cryptographic security (based on cryptographic assumptions) 
should be at least an individual’s lifetime. If unconditional security (not relying 
on cryptographic assumptions, but just on pure probability theory) is possible 
and feasible, it should be provided. 

e Migration of data between different hardware and software needs to be assured. 
When considering the long-term risks in an unpredictable setting, the sensitivity 
of personal data is of utmost importance. For the choice and protection level of 
personal data processed, a categorisation regarding their sensitivity with respect 
to privacy has to be made according to [HPS08, CHP 09]. 
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4.1.1.2 Dynamics in the individual’s ability or willingness of managing her 
private sphere on her own. 


During their lifetime, individuals pass through different stages. A stage of life is 
defined as follows: 


A stage of life of an individual with respect to handling her privacy is a period in her 
life in which her ability to manage her private sphere remains between defined boundaries 
characterising this stage of life. 


The management of one’s private sphere comprises the ability to understand privacy- 
and security-relevant aspects concerning one’s private sphere, the ability to (re-)act 
accordingly, and the ability to use appropriate (often ICT-based) means for one’s 
(re-)actions. Obviously toddlers cannot manage their private sphere on their own, 
nor can people suffering from pronounced dementia or those in a coma. Even for 
those who are mentally able to manage their private sphere, it may not be feasible if 
it requires using technical devices. 

Three large stages of life that individuals typically run through are childhood, 
adulthood and old age, which are depicted in the example shown in Fig. 4.2. It is 
quite clear that a baby is physically less able than a 10-year-old to interact with 
technical devices. So the ability of a child to manage her private sphere and her 
right to be consulted usually increase with her age. However, at least small children 
are not able to decide on their own how their data are created and processed and 
how their private sphere can be controlled. Also, adults may have temporary or 
permanent needs where others support them or even act on their behalf concerning 
decisions concerning their private sphere. This is true especially in old age. For small 
children, as well as for very old people, and in the case of emergency, delegation of 
the right to manage one’s private sphere is needed. For children, these delegates are 
automatically their parents; in case of emergency or for old people it might be a 
close relative. 


A Adulthood , 
Childhood 


Legal 
guardian 
necessary 


Ability to manage one’s privacy 
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Fig. 4.2: Stages of life: variation in the ability to manage one’s privacy. 
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Ability in the legal sense would not be partially linear functions as in Fig. 4.2 but 
more step-like and then remain constant until legal guardianship is needed again. 

Sometimes, individuals who in principle are able to manage their privacy on 
their own want to involve other parties or delegate their privacy control, e.g., for 
convenience reasons. These individuals may have the ability, but not the willingness 
to manage their private spheres on their own. Furthermore, both the ability and the 
willingness might change depending on the circumstances an individual is in and 
especially depending on what possible data controllers offer to her. 

A privacy-enhancing identity management system should support the delegation 
of duties and authorities. There are three possible situations that might occur re- 
garding delegation from the legal perspective: Firstly, delegation of rights might be 
made by law automatically for a certain time frame (e.g., for children to their par- 
ents). Secondly, delegation might be made willingly by an individual to others for 
a certain time frame (e.g., delivering mail to others during holidays). Thirdly, del- 
egation of rights of an individual might be initiated by other individuals to achieve 
delegation of her rights to them or others (e.g., in the case of incapacitating a per- 
son), which presumably requires thorough juridical investigation before divesting 
the person of a right. Delegation can be implemented by different means. Usually, 
the delegate does not take over the identity of the individual concerned, but receives 
authorisations to act — often within defined ranges — on behalf or as inheritor, re- 
spectively. Technical processes for delegation and digital estate have to be defined 
in accordance with legal procedures. We come back to this issue in Section 4.1.3. 


4.1.2 Digital Footprint 


The difficulties regarding partial identities and identity management in digital envi- 
ronments have another important complicating factor. Next to the partial identities 
consciously created by individuals to perform actions on the Web, huge amounts of 
data are collected just because of web activities or other electronic transactions. Ev- 
ery interaction with a communication device, consciously or unconsciously, leads 
to a data log; a digital trace. By accumulating these data and making connections 
between the data, extensive digital footprints of individuals can be created. 

Digital footprints are data that accumulate in information systems and can be 
indicated as belonging to one individual. This means that data in this sense is not 
restricted to personal data only. We outline in the following which data can be sen- 
sitive and how it can be linked. 


4.1.2.1 Sensitive Attributes 


Some attributes and attribute values usually need more privacy protection than oth- 
ers. According to [HPS08, CHP*09], we distinguish the following properties of 
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identity attributes, which, alone or in combination, pose specific risks to privacy 
when being disclosed: 


e Data may be static, or changes are quite accurately predictable: Data which are 
static over time and are disclosed in different situations enable linkage of related 
data. Examples for static data are date and place of birth. Similar to static data 
are those which are quite accurately predictable or guessable because they follow 
some rules. Examples are data following mathematical rules like the number of 
children that will only remain or increase. If static identity information is being 
used for purposes such as authentication, this bears a risk because these data 
cannot easily be revoked and substituted: For example, the use of fingerprints 
with biometric access systems. 

e Data may be (initially) determined by others: Data that the individual concerned 
cannot determine herself (e.g., the first name) may persist or it may take a signif- 
icant amount of time or great effort to change them. A special case is the inher- 
itance of properties from others, e.g., the DNA being inherited from the natural 
parents. 

e Change of data by oneself may be impossible or hard to achieve: If data are static 
(see above) or if data are not under the individual’s control, wilful changes may 
not be possible. Examples are data processed in an organisation. 

e Inclusion of non-detachable information: There are data that cannot be disclosed 
without simultaneously also disclosing some side information tied to the data. 
Examples are simple sequence numbers for identity cards, which often reveal 
gender, birth data and at least a rough timeframe of when the identity card was 
issued. 

e Singularising: If data enable the recognition of an individual within a larger 
group of individuals, the individual may be tracked or located, even if other per- 
sonal data of the individual are kept private. 

e Prone to discrimination or social sorting: There are no data that are definitely 
resistant against possible discrimination forever. This does not need the individ- 
ual to be identified or singularised. If some people disclose a property and others 
resist to do so, this already allows for social sorting or positive discrimination. 


Note that this list of sensitive properties extends the enumeration of special cate- 
gories from Art. 8 Data Protection Directive (“personal data revealing racial or eth- 
nic origin, political opinions, religious or philosophical beliefs, trade-union mem- 
bership, and the processing of data concerning health or sex life”). Because of the 
sensitivity of the listed personal data, everybody should be careful with related data 
processing. 


4.1.2.2 Linking Data 


When taking into account that each and every transaction in an information system 
leaves a digital trace behind, it becomes clear that the accumulation of data can be 
enormous and that it thus becomes easier to link data to a single individual. The 
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more data available, the more unique the combinations of these data are. The more 
data disclosed to a party over time, the lower the fingerprinting threshold becomes, 
which means that after a certain amount of time individuals can be uniquely iden- 
tified and recognised when they appear in a new transaction, even when this new 
transaction is done from another computer or another location than previous ones 
[Con09]. 

The possibilities of linking data that belong to one individual also have their 
drawbacks on the dynamics in the surroundings of the individual. At first glance, 
it might seem that changing surroundings can have a positive influence on identity 
management, because contexts can be separated relatively easily. Nevertheless, link- 
ability appears possible when disclosing information or revealing certain patterns 
over time, meaning that the different surroundings can be connected as belonging 
to the same individual as well, therewith creating more challenges with regard to 
identity management. These challenges are even more supported by the fact that the 
Internet does not forget. There is no digital oblivion. Once data are revealed some- 
where, they persist over time and remain stored in databases, cache memories and 
so on and so forth. 

Obviously, to make the connection between different data sets, some kind of 
connector is needed. This connector can be created with the help of sophisticated 
technologies, analysing clicking behaviour, click trails, and key strokes. Linkability 
between data (sets) can be based on an analysis of the data and revealing patterns. 
However, this becomes easier when a common identifier exists within different data 
sets (such as names, registration numbers and IP addresses) and is easiest when this 
identifier is a so-called unique identifier. Unique identifiers identify a single indi- 
vidual. Unique identification can be supported by governments who issue unique 
identification numbers to their citizens. These numbers are sometimes even used 
throughout different domains or areas of life, thereby linking these areas, and the 
data therein, as belonging to one single individual. In the PrimeLife project, a num- 
ber of European countries have been investigated on their use of unique identifiers 
in four formal areas (government, health care, education, employment) and their ap- 
plication range: In the Netherlands a unique identifier, the BSN, is commonly used 
in several settings which, in principle, allows for the construction of a compound 
identity, instead of the citizen having distinct identities in different areas. In Ger- 
many this is not the case. There is even a separation between the different federal 
states, which all have their own regime in certain contexts. However, all German 
citizens of age 16 and above are obliged to have a personal ID card, which is used 
for identification by public authorities. Up to now, this card only played a role in the 
physical world. Starting with the lst of November 2010, the new German eID card 
will enable trusted services to read out selected attributes over the Internet. Conse- 
quently, the holder of the eID card maintains control over which provider is allowed 
to read out which attributes. Additionally, the new eID card would enable the user 
to establish a secure pseudonymous communication with a given service provider. 
Therefore a unique pseudonym will be generated for each relation between an eID 
card holder and a service provider. France and Austria also have ID cards, but no 
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unique identifier that is used throughout public services. Belgium, Sweden, Ireland, 
and Poland do use unique identification numbers, often combined with ID cards. 

In the four formal areas, some differences do occur. Some countries have national 
student cards, others do not. The same goes for electronic health cards. The Nether- 
lands has no electronic health card, but is working on a central system, called EPD, 
which aims to improve health care by providing access to health records electron- 
ically. The system progresses towards a general comprehensive medical identity. 
In the area of employment, all investigated countries show centralised systems to 
register people that are unemployed. 


4.1.3 Concepts for Delegation 


In the context of identity management throughout life, one focus lies on investigat- 
ing the necessity of delegation for people who are not able to manage their needs of 
privacy for a limited time within a stage of life or forever. 

Delegation is a process whereby a delegate (also called “proxy”, “mandatory” 
or “agent’) is authorised to act on behalf of a person concerned via a mandate of 
authority (or for short: mandate) [HRS* 10]. 

The mandate of authority usually defines in particular: 


e The scope of authority for the actions of a delegate on behalf of the person con- 
cerned and 

e when and under which conditions the delegate gets the power of authority to act 
on behalf of the person concerned. 


The delegate shall only act on behalf of the person concerned if the delegate has 
the actual power of authority and if his action lies within the scope of authority. 
The simple acting of the delegate with the existence of a mandate while not having 
the power of authority would not be sufficient. The difference between mandate and 
power of authority becomes clear in the following example: In working life, the 
schedule of responsibilities may determine that person A should take over the work 
of colleague B if the latter is absent. The issuance of the mandate of authority to 
A is expressed by the schedule of responsibilities, but A’s actual power of authority 
only comes into existence if B is absent. Otherwise A must not act on behalf of B. 

The mandate of authority is issued by the delegator (also called “mandator’’). 
This may be the person concerned herself, but there are also cases where other 
entities explicitly decide on the delegation (e.g., in the case of incapacitation of a 
person, the guardianship court rules on delegation) or where the delegation is fore- 
seen in law (e.g., when parents are the default delegates of their young children). 
The mandate of authority is usually assigned for a specific period of time. Similar 
to the process of issuing a mandate, changing or revoking the mandate can be done 
by the delegator, i.e., by the person concerned herself or by other entities. The con- 
ditions and processes to issue, change, or revoke a mandate can be defined by the 
underlying contract or law. 
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Note that the delegate is not always aware of the mandate of authority or of the 
fact that he actually has the power of authority. So the delegator should implement 
an appropriate way of informing the delegate (and the person concerned if she is not 
the delegator herself) about the mandate and the power of authority. 

For supervising purposes of the delegation and related actions by the parties in- 
volved, one or more impartial delegation supervisors may be appointed by one or 
more of the actors. In particular, the person concerned may have the need to check 
whether the delegate really acts as agreed upon. 

The current civil legal framework encompasses several instruments regulating 
legal representation or agency, which have an effect with regards to the exercise of 
fundamental rights: For minors, the instrument of parental care is known in civil 
law. Most of the EC Member States also have legal regulations regarding the repre- 
sentation of children. The Article 29 Data Protection Working Party defined in its 
Opinion 2/2009 [DPW09] principles regarding children’s privacy, which we gener- 
alise in the following to the relation of persons concerned and delegates regarding 
privacy-relevant actions: 


e The delegate should act in the best interest of the person concerned. This may 
comprise protection and care, which are necessary for the well-being of the per- 
son concerned. 

e Guidelines for delegation should be defined beforehand. 

e The person concerned and her delegates may have competing interests. If con- 
flicts cannot be avoided, it should be clarified how to sort them out, possibly with 
the help of external parties. Note that a delegate does not necessarily stand in 
for all partial identities of the person concerned, which may lead to additional 
conflicts of interest of parties involved. 

e The degree of delegation should be geared to the capabilities of the person con- 
cerned regarding privacy and self-determination. This means that the degree of 
accountability of the person concerned has to be adapted over time, and regard- 
ing privacy-relevant decisions taken by the delegate, the person concerned has a 
right to be consulted. 


Each stage of life has significant question on how to handle identity management 
and in particular personal data and therefore has different requirements. It is quite 
clear that a baby is physically less able than a 10-year-old to interact with technical 
devices. It appears that the privacy protection rights of an individual are exercised 
by different people during the lifetime. This asks for a delegation system where it 
is clear for all parties involved who can perform which rights at which moment and 
in which context. The consequences of the delegate’s actions may both influence 
the privacy of the person concerned and the delegate herself to a certain extent. The 
following subsections explore various stages of life with respect to delegation. 
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4.1.3.1 Fruit of the womb 


Privacy throughout life comprises a very early stage of life, the prenatal phase of 
an individual. Even in this stage of life, there might be the need to protect personal 
data, for example, considering the privacy implications of prenatal DNA tests. In 
many EU Member States, there are discussions about the issue of genetic analysis 
and the threat that using genetic data poses for individual’s right of informational 
self-determination as well as potential discrimination. More detailed regulations re- 
garding requirements for genetic analysis and the use of genetic data could be a 
solution. 


4.1.3.2 Children and teenagers 


Growing autonomy is an important issue in the protection of children’s rights, in 
any area of law. The complexity of situations involving minors is based on the fact 
that children, despite having full rights, need a representative (delegate) to exercise 
these rights — including their privacy rights. 

Data protection for children starts within the first days after birth and the pro- 
cessing and storage of birth data or medical data within the hospital. The protection 
of the personal data of children resides more or less in the responsibility of par- 
ents or legal guardians as delegates by issued by law. But when a child grows up, 
other responsible persons for data processing in different areas of life may become 
involved, such as teachers, doctors or supervisors [HPS08]. 

The rights of the child, and the exercise of those rights — including that of data 
protection — should be expressed in a way that recognises both of these aspects of 
the situation [DPW09]. Until a certain age, children have no way to monitor data 
processing, simply because they are too young to be involved in certain activities. 
If their delegates (parents or other representatives) decide, for example, to put the 
child’s pictures on their profile in a social network, it is the delegate who makes 
the decision about the processing of the children’s data and gives the consent to do 
so on behalf of the child. Normally, putting pictures of another person in a social 
network profile requires consent of that person, the data subject. In the situation 
described here, the delegate (e.g. parents) is entitled to express the consent in the 
name of the child. Such situations may put the delegate in the double role — of data 
controllers, while publishing their child’s personal information open on the Web, 
and, at the same time, of consent issuers as the child’s representatives. This double 
role may easily lead to conflicts. Parents must take great care not to cross the line of 
the child’s best interest when processing the child’s data. 

It is necessary for the delegate (e.g. parents or other representatives) to listen 
carefully to the interests of the child at least beginning from a certain age and con- 
sider those interests when making a privacy-relevant decision, as that decision is 
binding for the child [DPW09]. When the child reaches legal age, it may want to 
change recent decisions of the delegate. Therefore the child needs to know what 
decisions about processing of personal data were made by the representatives. Af- 
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terwards, the child needs to give her explicit consent for the processing of personal 
data. This may be implemented in certain operations in a way that the operator is 
reminded that the person is over 18 and now the explicit consent is needed. This 
is relevant in many circumstances, for example, medical matters, recreational ac- 
tivities of the child, school matters, or agreements made by the delegate before the 
child’s majority. 

As children and teenagers are in the process of developing physically and men- 
tally, the rights of the child and the exercise of those rights — including the rights 
of data protection — should be accomplished in a way that recognises these aspects 
of the situation. Especially the adaptation of the degree of maturity of children and 
teenagers is a central aspect that has to be taken into account by their delegates. 
Children gradually become capable of contributing to decisions made about them. 
It is natural that the level of comprehension is not the same in the case of a 7-year- 
old child and a 15-year-old teenager.! This, in particular, has to be recognised by 
the children’s representatives. Therefore the children should be consulted more reg- 
ularly by adults, teachers, care-takers or other delegates about the exercise of their 
rights, including those related to data protection. 

The children’s delegate should also think about a way to document privacy- 
relevant decisions so that the children or young adults can later easily understand 
what personal data have been disclosed to whom and under which conditions. They 
may also then choose to actively approach certain data controllers to give or revoke 
consent concerning data processing or to request access, rectification or erasure of 
their personal data. 


4.1.3.3 Adults lacking privacy management capabilities 


For adults that may have temporary or permanent needs to get support or requiring 
that others act on their behalf concerning decisions concerning their private sphere, 
the distinction has to be made between delegation for legally relevant actions and 
non-legally relevant actions. All legally relevant actions regarding the processing 
of personal data are based on national legal regulations such as delegation or legal 
guardianship. 

In case of non-legally relevant actions, such as help with a social network or the 
Internet, in general the person concerned can freely decide what to do. The delegator 
or person concerned could choose a delegate (for example, a care-taker) to act in the 
name of the person on the basis of a contract to manage the private sphere. Then 
the person concerned should clearly define her expectations and needs regarding the 
representation and the power of disposal. 


' The level of comprehension is defined in different ways. For instance the US-American Chil- 
dren’s Online Privacy Protection Act (COPPA, Title XII — Children’s online privacy protection, 
SEC. 1302) defines a child as an individual under the age of 13. 
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4.1.3.4 Deceased people 


In situations where a person has deceased, the instrument of law of succession ap- 
plies. The European Data Protection Directive 95/46/EC assigns the right of privacy 
and data protection to “natural persons” (Article 1). Deceased persons are no longer 
regarded as data subjects. Protection against unregulated processing of data concern- 
ing deceased individuals in some European legal frameworks? is provided by means 
of a “post-mortal personality right”. In some situations, the instrument offered by 
the law of succession might not be sufficient — further regulations are needed. 

For instance, some users of social networks want their profile to exist even af- 
ter death or at least would like to be informed as to how the provider handles the 
personal data and the profile after death. Here, the action of providers of social net- 
works is required to find mechanisms and concepts for handling profiles after the 
death of the user. Various mechanisms are conceivable, for example, the user could 
determine how her profile should be handled after death within the registration pro- 
cess (deletion, blocking, proxy to contact, etc.). Therefore, SNS providers need to 
define clear measures and concepts to determine the handling of profiles after one’s 
death. In some situations, even the autonomous action of the SNS provider might be 
essential for the protection of users. For example, if a SNS user dies and the press 
accesses the SNS site to copy pictures, contacts, etc. of the dead user, the provider 
has to balance the protection of the users rights and her competence to, for exam- 
ple, block the profile without the consent of the legal assignee (because this has to 
happen very quickly). 

Meanwhile, new services appear on the market that offer to send out secure mes- 
sages to friends after the death of the user. Their goal is to give people a safe way to 
share account passwords, wills and other information. When users book the service 
against payment of a fee, they are given options for when to send messages or to 
delete some messages permanently after their death. It is problematic if authentica- 
tion credentials of the user have to be transferred to the service, which opens the 
way for misuse because it is not distinguishable for others whether the user or the 
service acts. 


4.1.3.5 Delegation based on explicit decisions of the data subject 


The Civil law recognises the instrument of legal representation for cases where the 
concerned individual is fully in possession of her mental capabilities and decides on 
her own to transfer the exertion of rights to another person (for example, Articles 
172 et seq. of German Civil Code*). Various reasons exist why a data subject may 
wish to transfer the full or partial legal authority of representation to another indi- 
vidual (the mandate of authority). For example, a person may simply be unavailable 


? Such as Germany: so-called “Mephisto decision” of the German Constitutional Court; BVerfGE 
30, 173. 

3 English translation of Biirgerliches Gesetzbuch (German Civil Code) is available here http: 
//bundesrecht. juris.de/englisch_bgb/englisch_bgb.html. 
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for a longer period of time with no access to information technology, which would 
allow transmitting and enforcing remote decisions (for example, during a scientific 
or recreational journey to a secluded region). Or a data subject may feel that cer- 
tain services which are handled online are better understood by friends or even a 
professional data custodian. Actions of and decisions by the delegate (authorised 
representative) may have consequences for the fundamental rights of the delegator. 
The delegator may have, at first glance, authorised the delegate to act on behalf via 
a mandate of authority, which, for example, only granted authority to the delegate to 
close one contract on the delegator’s behalf. Delivering the contractual duties, how- 
ever, will possibly also require the processing of personal data. The legal authority 
to represent a person concerned in closing a contract does include the implied au- 
thority to initiate the data processing steps necessary to fulfill the primary goal. The 
instrument of legal representation based on the data subject’s declared intention may 
also have an effect after the data subject’s death. The data subject may during her 
lifetime lay down a last will which binds the heirs. This last will may also comprise 
decisions regarding how to treat documents or electronic files containing personal 
data. 

The Art. 29 Working Party defined in its Option 2/2009 [DPW09] principles re- 
garding exercising the right of children. These principles may also be helpful for 
determining principles on delegation in general, because delegators (persons con- 
cerned) may have the problem that delegation in privacy-relevant situations might 
be interpreted in different ways. This means that one may have different needs on 
good practice of handling privacy. 


4.2 Demonstrator 


A scenario that is relevant to all areas and stages of life and deals with dynamics 
and possible delegation - this means the challenges we described in the previous 
Sections 4.1.1 and 4.1.3 - is the area of backup and synchronisation tools and appli- 
cations. 

Many backup systems and backup strategies, which have been available for many 
years, are already dealing with the problem of unwanted data loss. However, they are 
mostly protecting the raw data only and do not involve the data subject, his specific 
characteristics, social relations and interactions as a part of their scope. Existing 
backup systems and backup strategies also do not reflect the process of evolution of 
the data subject during his lifetime with respect to the possible different states he 
might pass through during his lifetime and which might have an immense influence 
on his ability to manage his data on his own behalf (e.g., illness, hospitalisation, 
or death). Additionally, existing systems and strategies dealing with the problem of 
unwanted data loss do not cope with boundaries among distinct areas of the data 
subject’s social interactions. However, these aspects are nowadays becoming more 
and more sensible on the level of the data, hand in hand with the massive expansion 
of the technology. 
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Therefore, we decided to analyse the problem of unwanted data loss from the 
perspective of lifelong privacy. We found that current solutions do not provide a 
sufficient level of data protection when it comes to lifelong extent of time and pri- 
vacy of the data subject holding the data. Based on our findings, we decided to 
demonstrate that it is possible to cope with problems amplified by the requirements 
on lifelong privacy when protecting the data subject against unwanted data loss. 

The proposed privacy-enhanced backup and synchronisation demonstrator fo- 
cuses on the following problems closely linked together under the light of lifelong 
privacy: 


1. Protection of the data subject against unwanted data loss during his lifetime by 

redundancy and physical distribution of the data; 
Our findings resulted in the conclusion that the problem of unwanted data loss 
can be solved by redundancy and the physical distribution of multiple copies 
of the data from the lifelong perspective. As far as backup and synchronisation 
tools are also dealing with the problem of unwanted data loss, we decided to 
establish the main conceptual pillars of our demonstrator on the backup and syn- 
chronisation functionality. In the demonstrator, we are proposing to solve the 
problem of unwanted data loss by taking advantages of services provided by on- 
line storage providers which are nowadays available on the Internet (for example 
Dropbox, Apple MobileMe, Windows Live SkyDrive, Ubuntu One and others) 
and store multiple copies of the data in a distributed environment. Distribution of 
potentially sensitive backup data in such kind of environment, however, leads to 
confidentiality problems. 

2. Assurance of lifelong confidentiality of the data subject’s data stored in a dis- 

tributed environment; 
The problem of data confidentiality in a distributed and untrusted environment 
can be solved by the encryption of the data. Encryption must assure that only 
the authorised Data Subject (whom the data belongs to) is able to operate with 
his data stored in distributed backups by default and nobody else should have 
implicitly access to it even after the death of the Data Subject. On the other hand, 
during the lifetime of the Data Subject, unpredictable situations might occur, 
which might temporarily or permanently limit him in his ability to access his own 
data (for instance in case of his illness, hospitalisation or death). This might lead 
to situations that his data, which might be important for other parties relying on it 
(possibly in a legal relationship with the Data Subject), is not accessible by these 
parties when needed (for example important work documents) or is permanently 
lost. 

3. Delegation of access rights to the data subject’s backup data allowing other par- 
ties to operate with his data if specific conditions are fulfilled; delegation ca- 
pability of the demonstrator allows other parties authorised by the data subject 
(whom the data belongs to) to access his backup data in case particular condi- 
tions specified by the data subject are satisfied. Delegation of access rights of the 
data subject’s backup data could in general lead to situations where authorised 
parties with corresponding access rights are not only able to access the desired 
data but also other data possibly covering other areas of the data subject’s life, 
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which they are not authorised to access. This might, however, not be desired by 
the data subject himself. 

4. Distribution of the backup data according to different areas of life of the data 
subject and his different partial identities. 
Distribution of the backup data according to particular areas of the data subject’s 
life or his different partial identities enables the data subject to manage his pri- 
vacy in such a way that allows him to physically and logically separate his data 
related to distinct domains of his social interaction. 


Besides the above mentioned problems, additional non-trivial issues must be ad- 
dressed, which are covered by the high-level requirements on prototypes developed 
within the PrimeLife project. As far as the demonstrator is based on the backup 
and synchronisation functionality, it also has to address further privacy-related is- 
sues amplified by the backup and synchronisation nature. Due to space limitations, 
we cannot elaborate all the requirements and possible solutions here. The interested 
reader is referred to PrimeLife document “Towards a Privacy-Enhanced Backup and 
Synchronisation Demonstrator Respecting Lifetime Aspects” [Pril0c]. 

In order to correctly understand the descriptions in the following sections, it is 
helpful to be familiar with the following terminology: 


Terms: 


Primary item: an original item for which one or more backup items are created 
during the back up action. In a general sense, a primary item can be referred to 
as any determinate set of data, which has one or more copies called backup items 
dedicated for backup purposes. A primary item can be a file but it can also be 
a more specific type of data such as, for instance, an e-mail, a contact, or even 
settings on the TV. 

Backup item: a copy of a primary item stored in the backup. A backup item re- 
flects the data of a primary item at the time when the backup item is created. 
Note that even if each backup item must belong to one and only one primary 
item, this primary item may not exist during the entire lifetime of the backup 
item. A backup item can exist in several versions at a particular point of time. 

Backup: anon-empty set of backup items. 

Backup task: describes which specific set of primary items should be backed up 
to which storage provider according to which schedule. The output of a run of a 
given backup task is a backup. 


Actors: 


Primary user: data subject who owns/holds primary items. 

Storage provider: provides storage space for backups. 

Delegate: an entity that receives particular rights on the backup from a delegator. 

Delegator: an entity that has the privilege to delegate rights to delegates concern- 
ing a particular backup. In most applications of this demonstrator, the primary 
user acts as the delegator. 
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Delegate candidate: an entity that was selected by delegator to act as a delegate 
but does not possess particular rights yet. 

Delegation request: a request sent to the delegate candidate asking him whether 
he accepts particular rights from the delegator. 

Credential issuer: an entity that issues a credential verifying a certain status of the 
primary user. This status can for example be: “primary user is ill,” “primary user 
is hospitalised,’ “primary user is dead,” or others. 


4.2.1 Overview of the Backup Demonstrator Architecture 


After describing the overall goals and visions with respect to the backup scenario 
in the previous section, we will report on the current state of the design and imple- 
mentation of the actual backup demonstrator in the rest of this chapter. 

The backup demonstrator consists of three main components: 


1. the core, which offers the main functionality. The core is written in Java and runs 
as a background process on the machine that contains the data (primary items) 
that should be backed up. The core makes its functionality accessible using a 
REST-like interface. 

2. a Web-based user interface (called “backup console’), which is written using 
HTML, CSS, JavaScript and Ajax. It can be displayed using an ordinary web 
browser. It utilises the REST calls offered by the core to accomplish the tasks 
desired by the user. 

3. a tray icon shown in the notification area of the taskbar found in many modern 
operating systems. This tray icon informs the user about important information 
and status changes related to the backup process. Moreover, it allows the user to 
launch the backup console. 


4.2.1.1 Basic building blocks used by the core 


The core is the central place that provides all the methods necessary to create backup 
tasks and actual backups, to restore them, to manage delegations etc. Moreover, it 
manages all the data (configuration information, credentials etc.) related to this. 
The entire functionality is provided through REST-like calls. Thereby HTTP is 
used as the underlying application level protocol. Therefore, the core contains an 
embedded web server (namely Jetty*). The binding between the HTTP URLs and 
the Java methods is done with the help of the “Java API for RESTful Web Ser- 
vice” (JAX-RS°). JAX-RS uses Java annotations, which simplifies the process of 


*http://eclipse.org/jetty/ 
Shttp://jcp.org/en/jsr/detail?id=311 
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making a Java method available as web service. Particularly the Jersey® reference 
implementation of JAX-RS is used. 

In case the URL itself does not encode all parameters and data necessary for 
a given REST call, the HTTP body contains the additional data needed as input 
for the given REST call. The data transmitted in the HTTP body can be encoded 
either using XML or JSON. The desired encoding type is set by the client. The 
marshalling/unmarshalling is done using JAXB’ (in case of XML encoding) and 
Jackson® (in case of JSON encoding). Both APIs allow an automatic mapping be- 
tween the internal representation and the external one, thus avoiding any need for 
manually implementing serialisation methods for every data type used. 


4.2.1.2 File system and external storage providers 


When it comes to the question of where to store the backups, the backup demonstra- 
tor follows the trend to store data “online,” e.g., by using dedicated online storage 
providers or more generally spoken “in the cloud.” Although storing backup data 
more “locally” is supported by the backup demonstrator (e.g., on an external hard 
drive connected to the machine, where data should be backed up), using online stor- 
age is the more important use case. The reason for this is clearly not the “following 
trends” aspect. It is rather driven by the “lifelong” aspects which should be demon- 
strated. 

On the one hand, the backup scenario implies that the data (backups) are avail- 
able for the entire life of the primary user. Clearly managing a large set of external 
hard drives would lead to much more burden on the user compared to managing 
contracts with online storage providers. Moreover, by using online storage, the data 
is accessible from every place in the world where an internet connection is available. 
This makes the whole process of delegating access rights to a backup much more 
feasible. Finally, it is much easier to store the backups as truly redundant copies, 
which in turn is a precondition for avoiding long term data losses. 

On the other hand, using online storage leads to interesting research questions 
with respect to the “lifelong privacy” aspects. First of all, the backed up data needs 
to be stored in a way so that only authorised entities can gain access to the actual 
backup content. Besides this need to achieve confidentiality of the backed up data, 
the “privacy” of the involved entities (e.g., primary user, delegates etc.) should be 
assured as well. In our concept, it means that each online storage provider should 
learn as little information as possible about the involved parties (e.g., who accesses 
which backup, at which time, how often etc.). 

The demonstrator uses the “Apache Commons Virtual File System”? library, 
which allows access to various different local and remote file systems by a sin- 


®https://jersey.dev. java.net / 
Thttp://jcp.org/en/jsr/detail?id=222 
8 http://jackson.codehaus.org/ 

9 http://commons.apache.org/vfs/ 
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gle API. Besides the online storage providers, which are supported by default (like 
SFTP, WebDAV, FTP etc.), plug-ins for well known online storage providers (like 
Dropbox or Ubuntu One) were developed. Although these plug-ins were deployed 
together with the software of the demonstrator, conceptually they could be down- 
loaded from the Internet as well. The general idea is that either some kind of di- 
rectory service lists available online storage providers and provides the necessary 
plug-ins or that at least a given online storage provider offers a plug-in for his ser- 
vice on his web site. 

A storage provider plug-in would not only implement all the necessary func- 
tionality to actually access the online storage but will also provide user interface 
components which allow a user to create a new account with this storage provider. 
This in turn comprises e.g., the necessary SLA and the payment. 

With respect to the trustworthiness (or more general the properties) of an online 
storage provider, the main assumptions are related to availability, i.e., it is assumed 
that some data stored at a given storage provider would be (with high probability) 
available according to the negotiated SLA. Beyond that, each storage provider is 
seen as a potential attacker (in the sense of the concepts of multi-lateral security). 
Especially, it should not be necessary to trust the storage provider with respect to 
confidentiality or integrity of the stored data. Nor should the storage provider be 
trusted with respect to the privacy of a given user. Therefore the backup demon- 
strator needs to implement appropriated measures to achieve these protection goals 
(e.g., by means of cryptographic mechanisms like encryption, integrity protection 
etc.). 

In order to reduce the linkability between different transactions done by the same 
user with a given online storage provider (or multiple storage providers), it is as- 
sumed that a communication layer anonymisation service is used. If the access to the 
online storage is based on HTTP (like WebDAV, Dropbox etc.), existing anonymi- 
sation services like AN.ON"® or Tor!! could be used. 

Nevertheless, there usually remains the linkability introduced at the application 
layer. Because we want to support existing (legacy) online storage providers, we 
cannot assume that they base their access control on unlinkable anonymous creden- 
tials. Rather, the common combination of login/password would be used. In this 
case, the only way to avoid linkability, e.g., that two backups belong to the same 
user, a user has to create multiple accounts (ideally using multiple online storage 
providers). Note that for the sole purpose of demonstration, we plan to develop 
a special online storage provider that in fact uses anonymous credentials for access 
control. More concrete, the implementation of this storage provider will be based on 
the Identity Mixer!” anonymous credentials, which a part of the PRIME Core!? de- 
veloped within the EU FP6 integrated project “Prime”! and the PrimeLife project. 


My http://anon.inf.tu-dresden.de/ 

https: //www.torproject.org/ 

2 http: //www.primelife.eu/results/opensource/55-identity-mixer/ 
3 http: //www.primelife.eu/results/opensource/73-prime-core/ 

14 https://www.prime-project.eu/ 
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4.2.1.3 Backup and Restore 


“Backup” and “Restore” are the most prominent functionalities common to nearly 
any existing backup solution. Moreover, most backup tools operate on a data level, 
that is the user selects files, directories, partitions or whole disk drives. This kind of 
selection mode is supported in the PrimeLife backup demonstrator as well. 

In addition to this common data level based selection mechanisms, the demon- 
strator offers an identity based selection mode. Remember that a basic concept of 
privacy-enhanced identity management is the separation into different partial iden- 
tities, areas of life and stages of life. Thus, the user can select from each of these 
categories, e.g., specify which area(s) of life or partial identities he wants to backup. 

Items of different types can be grouped together within a so-called container. A 
container can be understood as template of the primary items to be backed up. This 
would ease the related selection during the creation of a new backup task. 

Areas of life, partial identities and files are related to each other (see Figure 4.3). 
An Area of Life typically covers multiple partial identities; likewise a partial identity 
is related to many files. Note that a file can be related to more than one partial 
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identity and area of life, e.g., a digital photo that shows persons from the “Family 
Area of Life” as well as the “Working Area of Life.” 

In the field of privacy-enhanced identity management, one of the basic reasons 
for differentiating between different partial identities, areas of life etc. is to avoid 
unwanted linkability between them. Consequently, this unlinkability property has 
to be preserved for the backups as well. Thus, even if the user can group different 
partial identities or areas of life together for creating one logical backup task, the 
actual backup data need to be stored separately. Assume for instance, that the user 
selects files that belong to two different partial identities. In this case, two different 
backups will be created, ideally stored on two different online storage providers. 
Otherwise an attacker might learn that the two partial identities in question actually 
belong to one and the same user. 

You might wonder how the backup demonstrator knows about the existing ar- 
eas of life, partial identities and the relation among them and the files stored on the 
given machine. Conceptually, this information is provided as a core functionality 
of a privacy-enhanced identity management system, which in turn is out of scope 
of the backup demonstrator itself. Thus, for the backup demonstrator, we simply 
assume that such an IDM system is in place. However, it does not really exist to- 
day in practice. Therefore, we decided to create mockup data (areas of life, partial 
identities, files and folders) that are used for demonstrating the privacy-preserving 
backup aspects related to them. Nevertheless, the backup demonstrator allows for 
creating backups of real files and folders, but these items are not associated with any 
partial identity or area of life. Thus, from a privacy (linkability) point of view, they 
are treated as being “equal” and therefore are stored within a single backup. 


4.2.1.4 Delegation 


Delegation is considered to be one of the most important aspects to be shown by 
the demonstrator. It cannot only be seen as an innovative feature in itself, which is 
not common in today’s backup tools, but it also has many implications with respect 
to the area of lifelong privacy, which in turn is the underlying motivation for the 
demonstrator. 

From a functional point of view, delegation means that a primary user (the del- 
egator) delegates access rights to a particular backup to some delegates (see Fig- 
ure 4.4). These access rights are usually bound to a policy describing under which 
circumstances the access should be granted. A typical use case would be that a user 
delegates the access rights to the files related to his work partial identity/area of life 
to his colleagues under the policy that the access is only permitted if the user be- 
comes ill. Moreover, the policy would express the obligation that the user needs to 
be informed if one of his colleagues really accesses the backup. 

Delegation does not only deal with the privacy of the delegator but also with 
the privacy of the delegates. Therefore, a delegate will be asked if he is willing to 
accept the delegation request. Thereby in the spirit of informed consents he will be 
informed that each access to the backup would also be reported to the delegator. 
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Fig. 4.4: Management console of the backup demonstrator showing the interface for 
selection of delegates. 


It is currently an open question as to how much additional “meta-data” about 
a given backup will be communicated to a delegate. On the one hand, a delegate 
might want to know beforehand which files are included in a given backup, so that 
he can better decide if he really wants to accept the delegation or access the backup, 
respectively. On the other hand, this information might already have some negative 
impact on the privacy of the delegator. For enhancing the privacy of the delegator, 
it is desirable that a delegate only learns that information if he actually accesses 
a given backup. Thus, if the conditions defined in the access policy never become 
true, any unnecessary information flow will be avoided. 

In the current version of the demonstrator, the list of possible delegates is prede- 
fined as mockup data. There are plans to integrate other sources of information for 
these address book-like data. Natural sources are social networks such as Facebook, 
Xing, LinkedIn etc. Moreover, the current demonstrator uses plain e-mail messages 
for transmitting delegations and the related acceptance responses. Future versions 
of the demonstrator might use the communication infrastructure offered by the men- 
tioned social networks as well. 

The delegation itself is an XML document describing the contents of the backup, 
the location of the backup and under which circumstances (policy) the backup can 
be accessed by the delegate. The delegation might also transfer some credentials 
(e.g., issued by the delegator) to the delegate, which the delegate will need in order 
to access the backup. 

Conceptually, a lot of modern cryptographic mechanisms exist that could be used 
to construct a privacy-enhanced protocol for the purpose of delegation. Such a so- 
lution would require infrastructural support that is not in place today. Examples 
would be anonymous credentials issued by governmental or public institutions, pub- 
lic key infrastructures, attribute based access control mechanisms etc. Therefore we 
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decided to implement a much simpler scenario, which on the one hand is much 
closer to the current practice and on the other hand integrates components devel- 
oped within the PrimeLife project and thus would not only demonstrate delegation 
itself, but also illustrate the interplay of the various components developed within 
the PrimeLife project. 

Our delegation scenario comprises the following entities/components: 


1. The eCV, a component which allows a user to store an electronic version of his 
CV. This component was developed within the PrimeLife project to demonstrate 
privacy aspects in service oriented architectures (see Section 21). 

. A trusted third party (TTP) utilising the PrimeLife policy engine (see Section 20). 

. A legacy online storage provider. 

. The delegator. 

. A delegate. 


nb wh 


In this scenario, we demonstrate how a delegator can delegate the access to his 
backup to a delegate who can access the backup in case the delegator is ill. The 
delegation then would comprise the following steps: 


1. The delegator stores the encrypted backup at the legacy online storage provider. 
The encryption is done using a symmetric cipher and the key k. 

2. The delegator generates a random value k; and calculates kz such what k = kj ® 
ky. Note that ky =k @ky, i.e., ky can be seen as a one time pad encryption of k 
using the key ky. 

3. The delegator stores k; at the TTP. A PPL policy regulates the access to k; saying 
that: 


Only give access within a certain time frame and 

to someone who can present a credential C; and 

a credential proving that the delegator is currently ill. 

The obligation is to delete k; after the time frame is over and 
to inform the delegator if someone has accessed ky. 


4. The delegator sends kz, C;, C2 to the delegate (encrypted under the public key of 
the delegate). The delegator informs the delegate about the circumstances under 
which (illness of delegator and valid time frame) and how the delegate can access 
the backup. 


Now let’s assume that the delegator does in fact become ill. In this case, the del- 
egator (or his doctor) sends a certification of illness to the eCV of the delegator. 
Moreover, the access policy to that certification says that someone who shows cre- 
dential Cz is allowed to access the certificate of illness (in this case the delegator 
should be informed by the eCV). 

In case the delegate wants to access the backup of the delegator and thus uses the 
rights delegated to him, the following steps happen: 


1. The delegate downloads the encrypted backup. How the access control to the 
encrypted backup is done depends on the methods provided by the legacy online 
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storage provider. Usually there exist some means of sharing the stored data with 
a defined set of users. But a broader discussion of this issue is out of scope here. 

2. The delegate shows credential C2 to the eCV and requests the certificate of ill- 
ness of the delegator. Note that the delegate has received C in step 4 during the 
delegation process. 

3. The delegate requests k, from the TTP. He therefore shows credential C; together 
with the certificate of illness of the delegator to the TTP. The delegate gets k, and 
the TTP informs the delegator about that access to ky. 

4. Now the delegate is able to calculate k =k; Sk» and can decrypt the encrypted 
backup of the delegator. 


Of course, the description above shows a very brief overview explaining the 
general ideas we have with respect to using existing components developed by the 
PrimeLife project for the backup demonstrator. Further research needs to be done 
to avoid/remove all the linkage that remains between the different steps (e.g., using 
different transaction pseudonyms for delegator and delegate etc.). Moreover, the in- 
formation the various parties learn should be minimised (e.g., the TTP does not need 
to know that the “certificate of illness” is actually a certification of illness (e.g., we 
could use “meaningless” credentials here)). As a final remark, please note that all 
the parties mentioned above can be distributed (in the usual k out of n setting). To 
some extent, the involvement of the TTP already is a kind of distribution, because 
the eCV and the TTP can be seen as entities that store information accessible under 
a given access policy. 


4.2.2 Deployment and Usage of the Demonstrator 


Due to the platform independent design based on Java and web-technologies, the 
demonstrator can be installed and run on many modern operating systems including 
Linux, Mac OS X and Windows. Nevertheless, we decided to create a dedicated 
virtual machine based on VirtualBox!> as virtual machine monitor and Ubuntu!® 
Linux as guest operating system, which contains all the necessary components pre- 
installed. This makes the process of “playing” with the demonstrator much easier, 
especially if it comes to the more complex delegation scenarios. Besides the demon- 
strator itself, the virtual machine contains a local WebDAV online storage provider, 
two separate user accounts (one for the primary user/delegator and one for the dele- 
gate), a local e-mail infrastructure (SMTP and IMAP servers) etc. In order to make 
the installation of the virtual machine itself as easy as possible, a screencast explain- 
ing the necessary steps was created. 


D http: //www.virtualbox.org/ 
16 http: //www.ubuntu.com/ 
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4.3 Concluding Remarks 


In this chapter, the concept of the privacy-enhanced backup and synchronisation 
demonstrator was presented. It was shown that the objectives of lifelong privacy 
lead to practical results, which can be applied for solving real-life issues in enhanced 
ways. Our demonstrator reveals new problems that emerge as soon as lifelong as- 
pects related to the data subject are taken into consideration. We presented a new 
approach, which can help the average citizen to protect himself against unwanted 
data loss respecting his different partial identities and areas of life. Our approach 
proceeds in such a way that it takes into account lifelong aspects of a human being 
and corresponding implications within the scope of the privacy. 

Nevertheless, although for many of the envisaged problems that need to be solved 
with respect to lifelong aspects and here especially lifelong privacy solutions are 
known in principle. However, sometimes the concrete implementations that need 
to be realised in the demonstrator are currently still under development. This on 
the one hand explains why certain aspects and mechanisms were only described at 
a very high (abstract) level and defines on the other hand the research and devel- 
opment roadmap for the remaining five months planned for the finalisation of the 
demonstrator. 
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Part IT 
Mechanisms for Privacy 


Introduction 


Today’s society places great demand on the dissemination and sharing of informa- 
tion. Such a great availability of data, together with the increase of the computational 
power available today, puts the privacy of individuals at great risk. The objective of 
the mechanisms activity is therefore to do novel research on the different open issues 
of the complex problem of guaranteeing privacy and trust in the electronic society. 
Chapter 5 focuses on privacy-enhancing cryptographic technologies that can be used 
in practice. The chapter presents anonymous credential schemas and their extensions 
along with cryptographic applications such as electronic voting and oblivious trans- 
fer with access control. Chapters 6 and 7 addresses mechanisms supporting the pri- 
vacy of the users (transparency support tools, privacy measurement) and their elec- 
tronic interactions. In particular, Chapter 6 illustrates a privacy-preserving secure 
log system as an example of a transparency supporting tool, and Chapter 7 focuses 
on trust and interoperable reputation systems. Chapter 8 investigates the problem 
of assessing the degree of protection offered by published data and of protecting 
privacy of large data collections that contain sensitive information about users. The 
chapter presents an information theoretic formulation of privacy risk measures and 
describes fragmentation-based techniques to protect sensitive data as well as sensi- 
tive associations. Chapter 9 addresses the problem of providing users with means 
to control access to their information when stored at external (possibly untrusted) 
parties, presenting new models and methods for the definition and enforcement of 
access control restrictions on user-generated data. The chapter illustrates a novel 
solution based on translating the access control policy regulating data into an equiv- 
alent encryption policy determining the keys with which data are encrypted for ex- 
ternal storage. The solution is complemented by an approach based on two layers of 
encryption for delegating to the external server possible updates to the access control 
policy (without the need for the data owner to re-encrypt and re-upload resources). 


Chapter 5 
Cryptographic Mechanisms for Privacy 


Jan Camenisch, Maria Dubovitskaya, Markulf Kohlweiss, Jorn Lapon, and 
Gregory Neven 


Abstract With the increasing use of electronic media for our daily transactions, we 
widely distribute our personal information. Once released, controlling the dispersal 
of this information is virtually impossible. Privacy-enhancing technologies can help 
to minimise the amount of information that needs to be revealed in transactions, 
on the one hand, and to limit the dispersal, on the other hand. Unfortunately, these 
technologies are hardly used today. In this paper, we aim to foster the adoption of 
such technologies by providing a summary of what they can achieve. We hope that 
by this, policy makers, system architects, and security practitioners will be able to 
employ privacy-enhancing technologies. 


5.1 Introduction 


The number of professional and personal interactions we are conducting by elec- 
tronic means is increasing daily. These on-line transactions range from reading ar- 
ticles, searching for information, buying music, and booking trips, to peer-to-peer 
interactions on social networks. Thereby, we reveal a plethora of personal informa- 
tion not only to our direct communication partners but also to many other parties of 
which we are often not even aware. At the same time, electronic identification and 
authentication devices are becoming more and more widespread. They range from 
electronic tickets and toll systems, to eID cards and often get used across different 
applications. 

It has become virtually impossible to control where data about us are stored and 
how they are used. This is aggravated as storage becomes ever cheaper and the fact 
that the increasingly sophisticated data mining technologies allow for all of these 
data to be used in many ways that we can not even imagine today. 

It is thus of paramount importance to enable individuals to protect their electronic 
privacy. Luckily, there exists a wide range of privacy enhancing technologies avail- 
able that can be used to this end. These range from privacy-aware access control 
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and policy languages to anonymous communication protocols and anonymous cre- 
dential systems. The PRIME (Privacy-Enhancing Identity Management for Europe) 
project [PRIb] has shown that these technologies can indeed be used together to 
build trust and identity management systems that allows for protecting one’s on-line 
privacy and that they are ready to be applied in practice. The PrimeLife project [pria] 
has taken up these research results and is concerned with bridging the gap from re- 
search to practice. 

Let us, however, note that while technology can help, users also need to learn 
about the perils of our digital world and how to guard their privacy. Of course, ICT 
systems must, to this end, provide sufficient information to the users about what is 
happening with their data. 

It seems that making use of privacy-enhancing technologies is harder than for 
other security technologies. One reason for this might be that the properties that 
they achieve are often counter-intuitive, in particular in cases of cryptographic build- 
ing blocks. In an attempt to foster the adoption of privacy-enhancing technologies 
(PETs), we overview in this paper the most important cryptographic PETs and sum- 
marise what they achieve. We also give references for their technical details. Finally, 
we explain how these technologies can be embedded into larger systems. 


5.2 Cryptography to the Aid 


There is a large body of research on specific cryptographic mechanisms that can be 
used to protect one’s privacy. Some of them are theoretical constructs, but many are 
actually fully practical and can be readily applied in practice. We here concentrate 
on the latter ones. 

The oldest types of privacy-protecting cryptography are of course encryption 
schemes by themselves: they allow one to protect information from access by third 
parties when data is stored or sent to a communication partner. There are, however, a 
number of variants or extensions of such basic encryption that have surprising prop- 
erties that can offer better protection in many use cases as we shall see. Apart from 
encrypting, one often needs to authenticate information. Typically, this is done by 
using a cryptographic signature scheme. The traditional signature schemes typically 
provide too much authentication in the sense that they are used in a ways that reveals 
a lot of unnecessary contextual information. The cure here is offered by so-called 
anonymous credential schemes and their extensions which we will present. Finally, 
we briefly discuss a number of cryptographic applications such as electronic voting 
schemes and privacy-enhanced access control schemes. 
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5.3 Private Credentials, Their Extensions, and Applications 


Certified credentials form the cornerstones of trust in our modern society. Citi- 
zens identify themselves at the voting booth with national identity cards, motorists 
demonstrate their right to drive cars with driver licenses, customers pay for their 
groceries with credit cards, airline passengers board planes with their passports and 
boarding passes, and sport enthusiasts make their way into the gym using their 
membership cards. Often such credentials are used in contexts beyond what was 
originally intended: for example, identity cards are also used to prove eligibility for 
certain social benefits, or to demonstrate being of legal age when entering a bar. 

Each of these credentials contains attributes that describe the owner of the cre- 
dential (e.g., name and date of birth), the rights granted to the owner (e.g., vehicle 
class, flight and seat number), or the credential itself (e.g., expiration date). The in- 
formation in the credentials is trusted because it is certified by an issuer (e.g., the 
government, a bank) who in its turn is trusted. 

There are a number of different ways how such credentials can be technically 
realised. Depending on their realisation, they offer more or less protection of the 
user’s privacy. For instance, they are often realised by an LDAP directory main- 
tained by the issuer. That means that a user who wants to use a credential with a 
particular party (the verifier), will have to authenticate, typically with a username 
and password, towards the verifier who will then look up the user’s credentials in 
the LDAP directory. While this realisation might satisfy the security requirement of 
the verifier and the issuer, it offers virtually no protection to the users. Apart from 
username/password being a rather insecure authentication mechanism, the user has 
1) no control over which information the verifier requests from the issuer and 2) the 
issuer learns with which verifier the user is communicating. 

A better realisation of credentials is with certificates with so-called attribute ex- 
tensions [CSF 08]. Here, the user chooses a public/secret key pair and then obtains 
a certificate from the issuer on her public key. The certificate includes all statements 
that the issuer vouches for about the user. The user can then send this certificate to 
the verifier together with a cryptographic proof of ownership of the secret key. The 
user knows which data is revealed to the verifier by the certificate, but has to re- 
veal all of the included attributes so that the verifier can check the issuer’s signature. 
Moreover, if the verifier and the issuer compare their records, they can link the user’s 
visit to the issuing of the credential by simply comparing the issuer’s signature. 

Anonymous credentials [Cha81, Bra99, CLO1] (often also called private creden- 
tials or minimal disclosure tokens) solve all these problems and indeed offer the best 
privacy protection possible while offering the same cryptographic security. They 
work quite similarly to attribute certificates, the difference being that they allow 
the user to “transform” the certificate into a new one containing only a subset of 
the attributes of the original certificate. This feature is often called selective disclo- 
sure. The issuer’s signature is also transformed in such a way that the signature in 
the new certificate cannot be linked to the original signature; this is usually called 
unlinkability in the literature. 
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5.3.1 Extended Functionalities 


Apart from the basic features of selective disclosure and unlinkability sketched 
above, many anonymous credential systems offer additional features that can be 
very useful in practical use cases. In the following, we discuss the most important 
of these features. 


Attribute Properties 


Rather than revealing the complete value of an attribute, some credential systems 
allow the user in the transformation to apply any (mathematical) function to the 
original attribute value. For instance, if the original certificate contains a birthdate, 
the transformed attribute could contain only the decade in which the user was born. 
As a special case, the function could be boolean (meaning, having as output “true” 
or “false’’), so that only the truth of a statement about the attribute is revealed. For 
instance, based on the birthdate in a certificate, the user could prove that she is 
between 12 and 14 years old. The schemes also allow for logical AND and OR 
combinations of such boolean expressions [CDS94]. 


Verifiable Encryption 


This feature allows one to prove that a ciphertext encrypts a value that is contained 
in a credential. For instance, a service provider could offer its service to anony- 
mous users provided that they encrypt their name and address, as contained in their 
identity card, under the public key of a trusted third party, such as a judge. The 
cryptography ensures that the service provider himself cannot decrypt the name and 
address, but can rest assured that the ciphertext contains the correct value. In case 
of misuse of the service, the service provider or a law enforcement agency can then 
request the third party to decrypt the user’s name and address from the ciphertext, 
i.e., to revoke the anonymity of the user. Note that it can be decided at the time of 
showing the credential, whether or not any information in the credential should be 
verifiably encrypted, i.e., this need not be fixed at the time the credential is issued 
and can be different each time a credential is shown. 

An essential feature that we require in this setting from an encryption scheme is 
that of a label [CS03]. A label is a public string that one can attach to a ciphertext 
such that without the correct label, the ciphertext cannot be decrypted. The most 
common usage for the label in our setting is to bind the conditions and context 
under which the trusted third party is supposed to decrypt (or not decrypt) a given 
ciphertext. 

In principle, one can use any public encryption scheme for verifiable encryp- 
tion [CD00]. The most efficient way to do so, however, is probably using the Paillier 
encryption scheme [Pai99] for which efficient proof protocols exist with a variant, 
secure against chosen-ciphertext attacks [CS03]. Security against chosen-ciphertext 
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attacks is actually crucial in this setting: the trusted third party’s job is essentially a 
decryption oracle and hence semantic security is not sufficient. 


Revocation of Credentials 


There can be many reasons to revoke a credential. For example, the credential and 
the related secret keys may have been compromised, or the user may have lost her 
right to carry the credential. Also, sometimes a credential might only need to be 
partially revoked. For instance, an expired European passport can still be used to 
travel within Europe, or a driver’s license revoked because of speeding could still be 
valid to prove the user’s age or address. 

Possible solutions for revocation in the case of non-anonymous credentials is to 
“blacklist” all serial numbers of revoked credentials in a so-called certificate revo- 
cation list [CSF*08] that can be queried on- or off-line. Another option is to limit 
the lifetime of issued credentials by means of an expiration date and periodically re- 
issue non-revoked credentials. The latter solution works for anonymous credential 
as well, although re-issuing may be more expensive than for ordinary credentials. 
The former solution as such does not work, as revealing a unique serial number of 
a credential would destroy the unlinkability property. However, the general prin- 
ciple of publishing a list of all valid (or invalid) serial numbers can still work if, 
rather than revealing their serial number, users leverage the attribute property fea- 
ture to prove that it is among the list of valid serial numbers, or that it is not among 
the invalid ones. A number of protocols that work along these lines have been pro- 
posed [BS04, BDD07, NFHFO09] where the solution by Nakansihi et al. [NFHF09] 
seems to be the most elegant one. 

Another solution inspired by revocation lists is the use of so-called dynamic ac- 
cumulators [CLO2, CKS09]. Here, all valid serial numbers are accumulated (i.e., 
compressed) into a single value that is then published. In addition, dynamic accu- 
mulators provide a mechanism that allows the user to prove that the serial number 
of her credential is contained in the accumulated value. Whenever a credential is 
revoked, a new accumulator value is published that no longer contains the revoked 
serial number. The schemes, however, require that users keeps track of the changes 
to the accumulator to be able to perform their validity proofs. 

We observe that enabling revocation brings along the risk that the authority in 
control of the revocation list (or accumulator value) modifies the list to trace trans- 
actions of honest users. For instance, the authority could fraudulently include the 
serial number of an honest user in the revocation list and then check whether the sus- 
pected user succeeds in proving that her credential is not on the list. Such behaviour 
could of course be noted by, e.g., a consumer organisation monitoring changes to 
the public revocation values. 

One idea to lessen the trust that one has to put into such a third party is by using 
threshold cryptography, i.e., by distributing the power to update the revocation list 
over multiple entities such that a majority of them is needed to perform an update. 


122 J. Camenisch, M. Dubovitskaya, M. Kohlweiss, J. Lapon, G. Neven 


Limited-use credentials 


Some credentials, such as entrance tickets, coupons, or cash money, can only be 
used a limited number of times. A very basic example of such credentials in the 
digital world is anonymous e-cash, but there are many other scenarios. For instance, 
in an anonymous opinion poll one might have to (anonymously) prove ownership 
of an identity credential, but each credential can only be used once for each poll. 
Another example might be an anonymous subscription for an on-line game, where 
one might want to prevent that the subscription credential is used more than once 
simultaneously, so that if you want to play the game with your friends, each friend 
has to get their own subscription [CHKT 06]. 

When implementing a mechanism to control the number of times that the same 
credential can be used, it is important that one can define the scope of the usage 
restriction. For instance, in the opinion poll example, the scope is the specific poll 
that the user is participating in, so that participating in one poll does not affect his 
ability to participate in another one. For electronic cash, on the other hand, the scope 
is global, so that the user cannot spend the same electronic coin at two different mer- 
chants. Overspending occurs when the same credential is used more than specified 
by the usage limit within the same scope. Possible sanctions on overspending could 
be that the user is simply denied access to the service, or that some further attributes 
from the user’s credential are revealed [CHL06, CHK* 06]. 

With limited-use credentials, one can prevent users from sharing and redistribut- 
ing their credentials to a large extent. Another means of sharing prevention is the 
so-called all-or-nothing sharing mechanism [CLO1]. This mechanism ensures that if 
a user shares one credential with another user (which requires revealing to the other 
user the secret key material of that credential) then the other user can also use all the 
other credentials (because they are based on the same secret key material). In this 
case sharing a single credential would mean sharing one’s entire digital identity, e.g., 
including access to one’s bank account, which people probably are not prepared to 
do. If, however, one wishes to make sharing of credentials infeasible, then they need 
to be protected by tamper-resistant hardware, which we discuss next. 


Hardware Protection 


Being digital, anonymous credentials are easily copied and distributed. On the one 
hand, this is a threat to verifiers as they cannot be sure whether the person presenting 
a credential is the one to whom it was issued. On the other hand, this is also a threat 
to users as it makes their credentials vulnerable to theft, e.g., by malware. 

One means to counter these threats is to protect a credential by a tamper-resistant 
hardware device such as a smart card, i.e., to perform all operations with the creden- 
tial on the device itself. A straightforward way of doing so in a privacy-friendly way 
would be to embed the same signing key in all issued smart cards. The disadvantage 
of this approach is that if the key of one card is compromised, all smart cards have 
to be revoked. 
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A more realistic approach is to implement the Camenisch-Lysyanskaya creden- 
tial system on a standard Java card [BCGS09]. However, depending on the type of 
smart card, it might only be possible to process a single credential on the device. In 
this case, one could still bind other credentials to the device by including in each cre- 
dential an identifier as an attribute that is unique to the user [Cam06]. All of a user’s 
credentials should include the same identifier. (The issuing of these credentials can 
even be done without having to reveal this identifier.) When an external credential 
(i.e., a credential that is not embedded in the smart card) is shown, the verifier re- 
quires the user to not only show the external credential but also the credential on the 
smart card, together with a proof that both credentials contain the same identifier. 
Using the attribute properties feature, users can prove that both credentials contain 
the same identifier without revealing the identifier. 


5.3.2 Direct Anonymous Attestation 


How can a verifier check that a remote user is indeed using a trusted hardware mod- 
ule, without infringing on the privacy of the user, and without having to embed the 
same secret key in each module? This questions arose in the context of the Trusted 
Computing Group (TCG). In particular, the Trusted Platform Module (TPM) moni- 
tors the operating system and can then attest to a verifier that it is pristine, e.g., free 
of viruses and thus safe for running an application such as e-banking. To protect pri- 
vacy, the TCG has specified a scheme for this attestation that can essentially be seen 
as a group signature scheme without the opening functionality, so that anonymity 
cannot be revoked [BCC04] but with a revocation feature such that stolen keys can 
nevertheless be identified and rejected. 


5.4 Other Privacy-Enhancing Authentication Mechanisms 


There are a number of primitives that are related to anonymous credentials. Some 
of them are special cases of anonymous credentials, while others can be seen as 
building blocks or share the same cryptographic techniques to achieve anonymity. 


Blind Signatures 


A blind signature scheme [Cha83] allows a user to get a signature from the signer 
without the signer being aware of the message nor the resulting signatures. Thus, 
when the signer at some later point is presented with a valid signature on a mes- 
sage, he is not able to link it back to the signing session that produced the signature. 
Blind signature schemes are a widely used building block for schemes to achieve 
anonymity. Examples include anonymous electronic voting [Cha83, FOO91] and 
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electronic cash [Cha83], which we discuss below. A large number of different blind 
signature schemes have been proposed in the literature based on various crypto- 
graphic assumptions; there are too many to be listed here. 

The main feature of blind signatures is that the signer has no control whatsoever 
on the message being signed. This feature can at the same time be a drawback. 
Typically, the signer wants to impose certain restrictions on the message that he’s 
signing, such as the expiration date of a credential, or the denomination of a digital 
coin. When used in protocols, blind signatures therefore often have to be combined 
with inefficient “cut-and-choose” techniques, where the user prepares many blinded 
versions of the message to be signed, all but one of which are to be opened again, 
and the remaining one is used to produce the signature. A more efficient approach 
is to use partially blind signatures [AF96], where the signer determines part of the 
signed message himself, allowing him to include any type of information, such as 
the issuance or expiration date of the signature. 


Electronic cash 


The goal of (anonymous) electronic cash [Cha83] is to prevent fraud while achiev- 
ing the same privacy guarantees as offered by cash money in the real world. In 
particular, when a user withdraws an electronic coin from the bank, spends it at a 
merchant, and the merchant deposits the electronic coin at the bank, the bank can- 
not link the coin back to the user. However, if either the user or the merchant try to 
cheat by spending or depositing the same coin twice, the identity of the fraudster is 
immediately revealed. 

Online electronic cash, 1.e., where the bank is online at the moment a coin is 
spent, can be built using blind signatures by having the bank blindly sign random 
serial numbers. After having issued the blind signature to a user, the bank charges 
the user’s account. The user can spend the money with a merchant by giving away 
the random serial number and the signature. To deposit the coin, the merchant for- 
wards the serial number and signature to the bank, who verifies the signature and 
checks whether the serial number has been deposited before. If not, the bank credits 
the merchant’s account; if so, the bank instructs the merchant to decline the trans- 
action. 

In off-line electronic cash [CFN88], the bank is not involved when the coin is 
spent, only when it is withdrawn or deposited. The techniques described above are 
therefore enhanced to, at the time of deposit, distinguish between a cheating user and 
a cheating merchant, and in the former case, to reveal the identity of the cheating 
user. Both online and off-line electronic anonymous cash can be seen as special 
cases of limited-use anonymous credentials as described above, where a single scope 
is used for all payments. To obtain off-line electronic cash, the user is required to 
provide a verifiable encryption of her identity, which is only decrypted in case of 
fraud. 
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Group Signatures 


A group signature scheme [CvH91] allows group members to sign messages in a 
revocably anonymous way, meaning that any verifier can tell that the message was 
signed by a group member, but not by which group member, while a dedicated 
opening manager can lift the anonymity of a signature and reveal the identity of the 
signer who created it. Group membership is controlled by a central group manager, 
who generates the group’s public key and provides the individual members with 
their secret signing keys. Some schemes combine the roles of group manager and 
opening manager in a single entity. 

Group signatures satisfy a whole range of security properties, including unforge- 
ability (i.e., no outsider can create valid signatures in name of the group), unlinka- 
bility (i.e., signatures by the same signer cannot be linked), anonymity (i.e., nobody 
except the opening manager can tell which signer created a signature), traceabil- 
ity (i.e., any valid signature can be traced back to a signer), exculpability (1.e., 
no collusion of cheating signers can create a signature that opens to an honest 
signer), and non-frameability (i.e., not even a cheating group manager can create 
a signature that opens to an honest signer). Many of these properties are in fact 
related [BMW03, BSZO05]. 

The showing protocol of many anonymous credential systems follows a typical 
three-move structure that allows them to be easily converted into a signature scheme 
by means of a hash function [FS87]. The resulting signature scheme inherits all the 
anonymity features of the credential system. A group signature scheme can then 
be obtained by combining it with verifiable encryption: the issuer plays the role of 
group manager and issues to each group member a credential with a single attribute 
containing his identity. Group members do not reveal their identity attribute when 
signing a message, but verifiably encrypt it under the public key of the opening 
manager. One can take this approach even further by including more attributes and 
using the attribute properties feature. For example, one could create a signature that 
reveals that one authorised group member between 18 and 25 years old signed the 
message, but only the opening manager can tell who exactly did. 


Ring Signatures 


One possible disadvantage of group signatures is that the group manager decides on 
the composition of the group, and that members can only sign in the name of that 
group. Ring signatures [RSTO1] are a more flexible variant of group signatures that 
have no group manager or opening manager. Rather, users can determine the group 
of “co-signers” at the time a signature is created. The co-signers’ collaboration is 
not needed in the signing process, so in fact, they need not even be aware that they 
are involved in a ring signature. There is no authority to reveal the identity of the 
signer behind a ring signature, but some schemes allow the signer to voluntarily 
prove that they created a signature. 
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Redactable and Sanitisable Signatures 


In some applications, it may be necessary to hide words, sentences, or entire para- 
graphs of a signed document without invalidating the original signature. This is ex- 
actly what redactable [JMSW02] and sanitisable [ACdMTO5] signatures allow one 
to do, the difference being that in the former anyone can censor a document, while 
in the latter only a censoring authority designated by the original signer can do so. 
Both primitives satisfy a privacy property implying that it is impossible to link back 
a censored signature to the original signature that was used to create it. 


5.4.1 Privacy-Enhancing Encryption 


While the main focus of this work is on privacy-enhancing authentication, a com- 
plete privacy-friendly infrastructure also involves special encryption mechanisms. 
We already touched upon verifiable encryption in relation to anonymous creden- 
tials. We discuss a selection of other privacy-relevant encryption primitives here. 


Anonymous Communication 


Most of the anonymous authentication mechanisms described above rely on an 
anonymous underlying communication network: cryptographic unlinkability of sig- 
natures clearly does not help if the users are identifiable by their IP address. Mix 
networks [Cha81] can be used to obfuscate which user communicates with which 
servers by routing the traffic through an encrypted network of mix nodes. The exact 
route that a packet follows can either be decided by the mix node or by the sender of 
the packet. In the latter case, the message is wrapped in several layers of encryption, 
one layer of which is peeled off at each node; this process is often referred to as 
onion routing [Cha81, GRS99, CLO5]. So-called dining cryptographer networks or 
DC-nets [Cha88] even hide the fact whether entities are communicating at all, but 
they of course incur a constant stream of dummy traffic between all participants in 
doing so. 


Homomorphic and Searchable Encryption 


With current technology trends such as software as a service and cloud computing, 
more of our information is stored by external services. Storing the information in 
encrypted form is often not an option, as it ruins either the service’s functionality 
or its business model. As the main goal of encryption is to hide the plaintext, it 
usually destroys any structure present in the plaintext; tampering with a ciphertext 
either renders it invalid, or turns the plaintext into unpredictable random garbage. 
Some encryption algorithms however are homomorphic, in the sense that applying 
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certain operations on ciphertexts has the effect of applying other operations on the 
plaintexts. One can thereby process encrypted data without decrypting it, so that for 
example a server can apply data mining mechanisms directly on encrypted informa- 
tion [OS07]. There exist homomorphic encryption schemes that support multipli- 
cation [EIG85] and addition [Pai99] of plaintexts, and since recently, also schemes 
that support both at the same time [Gen09]. 

In similar scenarios it can be useful if a server can search through encrypted in- 
formation without having to decrypt it. For example, this would enable an encrypted 
email hosting server to perform efficient searches on your email and transmit only 
the matching (encrypted) emails. Special-purpose schemes have been developed for 
this purpose as well, both in the symmetric [SWPO0O] and the asymmetric [BCOP04] 
setting. 


Oblivious Transfer 


Imagine a database containing valuable information that is not sold as a whole, 
but that rather charges customers per accessed record. At the same time, the list of 
queried records reveals sensitive information about the customers’ intentions. For 
example, a company’s search queries to a patent database or to a DNA genome 
database may reveal its research strategy or future product plans. 

An oblivious transfer protocol [Rab81] solves this apparently deadlocked situa- 
tion by letting a client and server interact in such a way that the server does not learn 
anything about which record the client obtained, while the client can only learn the 
content of a single record. The adaptive variant [NP99] of the primitive can amortise 
communication and computation costs over multiple queries on the same database. 


5.5 Electronic Voting, Polling, and Petitions 


Voting privacy is more than just a desirable feature, it is a fundamental principle for 
a democratic election. Electronic voting schemes have been proposed based on mix 
networks [Cha81], based on homomorphic encryption [CF85], and based on blind 
signatures [FOO92]. Electronic voting schemes form the backbone of e-democracy 
and should be properly designed and verified to guarantee a variety of security prop- 
erties, such as end-to-end verifiability, voter anonymity, as well as coercion and re- 
ceipt freeness. 

Other mechanisms such as electronic petitions and verifiable electronic opinion 
polls aim at strengthening participatory democracy. The limited-use restrictions of 
anonymous credentials makes them applicable to such scenarios. As discussed in 
Section 5.3.1, the scope of an anonymous credential with limited show in an opin- 
ion poll or e-petition system is the identifier of the poll/petition that the user wants 
to participate in. In this way, anonymous credentials with limited show function- 
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ality restrict a user to signing only once for a specific poll, without affecting her 
possibility to participate in other polls/petitions. 

Several demonstrators based on this basic idea have been built to show the fea- 
sibility of this approach [DKDt 09, BP10, VLV*08, TBD* 10]. The latter also im- 
plemented parts of the anonymous credential protocol in a chip card using software 
technology and hardware[jav 10] similar to the one used in European identity cards. 
The deployment of such a system would bind an electronic petition signing directly 
to a European citizen. The properties of the anonymous credential system would al- 
low for further restrictions to the scope of the partition. For instance, for local issues 
it would be required to be a resident of a particular district in order to be able to 
participate in the petition. Moreover, with this technology, it is simple to extend the 
application such that a poll may include restrictions on who can participate (e.g., 
only persons older than 18). Optionally, the user may selectively disclose attributes 
or properties of those attributes (e.g., an age interval) that may be used for statistics. 


5.6 Oblivious Transfer with Access Control and Prices 


The techniques described above can be combined in various way to address inter- 
esting business needs. For example, imagine that each record in a patent or DNA 
database as described above is protected by a different access control policy, de- 
scribing the roles or attributes that a user needs to have in order to obtain it. By 
combining anonymous credentials with adaptive oblivious transfer protocols, one 
can construct solutions where the user can obtain the records she is entitled to, with- 
out revealing the applicable access control policy to the database, or which roles she 
has [CDNO09]. By another combination of such techniques, the database can attach 
different prices for each record, and let users only download as many records as 
their prepaid balance allows, all while remaining completely anonymous [CDN10]. 


Oblivious Transfer with Access Control 


Consider the case of access to a database where the different records in the database 
have different access control conditions. These conditions could be certain at- 
tributes, roles, or rights that a user needs to have to access the records. The as- 
signing of attributes to users is done by a separate entity called the issuer, external 
to the database. To provide the maximal amount of privacy, a protocol is required 
such that: 


Only users satisfying the access conditions for a record can access that record; 
The service (database) provider does not learn which record a user accesses; 
The service (database) provider shall not learn which attributes, roles, etc. a user 
has when she accesses a record, i.e., access shall be completely anonymous, nor 
shall it learn which attributes the user was required to have to access the record. 
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To fulfill all the above requirements, we construct an Oblivious Transfer with 
Access Control (AC-OT) protocol [CDNO09], which is based on the oblivious trans- 
fer protocol by Camenisch et al. [CNSO7] and works as follows. Each record in the 
database has an access control list (ACL). The ACL is a set of categories. We note 
that the name “category” is inspired by the different data categories that a user is 
allowed to access. However, the category could just as well encode the right, role, 
or attribute that a user needs to have in order to access a record. 

The database server first encrypts each record with a unique key and publishes 
these encryptions. The encryption key is derived from the index of the record, the 
ACL of the record, and a secret of the database server. Although the secret of the 
database is the same for all record keys, it is not possible to derive the encryption 
key for one record from that of another record. Thus, to decrypt a record the user 
needs to retrieve the corresponding key from the server. 

To be able to do this, the user has to obtain the necessary credentials from the 
issuer. Each anonymous credentials [Cha85, LRSW99, CLO1], issued to a user, cer- 
tifies a category of records the user is allowed to access. Recall that anonymous cre- 
dentials allow the user to later prove that she possesses a credential without revealing 
any other information whatsoever. Also, anonymous credential systems provide dif- 
ferent revocation mechanisms. Note that if a record has several categories attached 
to it, then the user must have a credential for all of these categories, basically im- 
plementing an AND condition. If one would want to specify an OR condition, one 
could duplicate the record in the database with a second set of categories. 

To obliviously access a record for which the user has the necessary credentials, 
she engages in a transfer protocol with the database and while retrieving a key, 
gives a zero-knowledge proof of knowledge that she possess credentials on all the 
categories that are encoded into the key that she wants to retrieve. If she succeeds 
then she can decrypt that record, otherwise, she cannot. The database learns noth- 
ing about the index of the record that is being accessed, nor about the categories 
associated to the record. 


Priced Oblivious Transfer with Rechargeable Wallets 


Now consider a database where each record may have a different price, for example, 
DNA or patent database, as described above. In this setting, it is necessary to prevent 
the database from gathering information about a customer’s shopping behaviour, 
while still allowing it to correctly charge customers for the purchased items. 

To solve this problem we propose the first truly anonymous priced oblivious 
transfer protocol (POT) [CDN10], where customers load money into their pre-paid 
accounts, and can then start downloading records so that: 


e The database does not learn which record is being purchased, nor the price of the 
record that is being purchased; 

e The customer can only obtain a single record per purchase, and cannot spend 
more than his account balance; 

e The database does not learn the customer’s remaining balance; and 
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e The database does not learn any information about who purchases a record. 


We note that previous POT protocols ( [AIRO1, Tob02, RKP09]) do not provide full 
anonymity (the last requirement) for the customers: the database can link transac- 
tions of the same customer. Furthermore, they also lack a recharge functionality: 
once a customer’s balance does not contain enough credit to buy a record, but is 
still positive, the customer cannot use up the balance, but will have to open a new 
account for further purchases. Even if the protocol can be extended so that the cus- 
tomer can reveal and reclaim any remaining credit, he will leak information about 
his purchases by doing so. In our protocol, customers can recharge their balances 
anonymously at any time. 

In addition, we provide an enhanced protocol where records are transferred us- 
ing an optimistic fair exchange protocol [ASW97, ASWO0], thereby preventing a 
cheating database from decreasing a customer’s wallet without sending the desired 
record. 

Here, in Priced Oblivious Transfer with Rechargeable Wallets protocol, as with 
the AC-OT protocol, the database provider encrypts and publishes the entire en- 
crypted database. Each record is encrypted with a unique key that is derived from 
its index and its price. 

To be able to access records, a customer first contacts the provider to create a 
new, empty wallet. Customers can load more money into their wallet at any time, 
using an anonymous e-cash scheme, for example. 

When a customer wants to purchase a record with index i and price p; from the 
database, the provider and the customer essentially run a two-party protocol, at the 
end of which the customer will have obtained the decryption key for the record i 
as well as an updated wallet with a balance of p; units less. This is done in such 
a way that the provider does not learn anything about i or p;. More precisely, we 
model wallets as one-time-use anonymous credentials with the balance of the wallet 
being encoded as an attribute. When the customer buys a record (or recharges her 
wallet), she basically uses the credential and gets in exchange a new credential with 
the updated balance as an attribute, without the provider learning anything about the 
wallet’s balance. The properties of one-time-use credentials ensure that a customer 
cannot buy records worth more than what she has (pre-)paid to the provider. 

To sum up, we construct protocols that allow users to obtain records from the 
database they are entitled to (by the access control rules and/or by purchasing a 
record), and at the same time provide full anonymity for the users and prevent the 
database from gathering any statistics about their transactions. 


5.7 Oblivious Trusted Third Parties 


Anonymous/private credentials allow for implementing electronic transactions that 
are unlinkable and for selectively disclosing the minimal amount of information 
about the user. At the same time, these transactions have to be accountable. When 
using anonymous credentials, transactions are automatically accountable in the 
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sense that the verifier is ensured that what is being proven during the credential 
show, is indeed vouched for by the issuer. However, many real-life applications have 
to consider exceptional cases in which additional information is required in case of 
a malicious transaction. 

When the conditions for detecting such abuse can be expressed mathematically 
and can be detected inside of the electronic system, one can often mitigate such 
malicious transactions cryptographically. Examples for such transactions are offline 
double spending and money laundering resistant e-cash systems as well as the e- 
petition system sketched above. 

In other situations, e.g., when a suspect might have used an anonymous credential 
for physical access control to a crime scene, the evidence that additional information 
is allowed to be recovered, e.g, the identity of all users that accessed the premise dur- 
ing a certain time period, lies outside of the system. The most simple solution is to 
reveal a verifiable encryption of this information during the show of the credential. 

In particular, a user U would encrypt her true identity with the public key of 
the anonymity revocation authority RA, a form of trusted third party (TTP) and 
provides this encrypted data to a service provider SP. She then convinces SP ina 
zero-knowledge proof of knowledge that this encrypted data contains her valid user 
identity that can be opened by the authority if it decides that the opening request is 
legitimate. 

This solution, however, raises several concerns. 


1. It involves a fully trusted party, the revocation authority, that is able to link all 
transactions with no graceful degradation of privacy and security, should the re- 
vocation authority become compromised. 

2. Additionally, the solution does not provide the best achievable accountability 
properties, as especially powerful users could bribe or threaten the RA such that 
it would refuse to open particular ciphertexts. 

3. Honest service providers find the traditional system encumbering because of the 
need to involve such highly trusted authorities for even minor dispute cases. For 
example, to bring a case to law enforcement in the real world is likely to have a 
non-trivial cost, both in the time required, and in support from legal council. 


There are two avenues that can be followed to reduce the trust into a trusted 
third party (TTP) like the revocation authority. One is to distribute the TTP such 
that it does not run on a single machine but on multiple machines. Each machine 
is owned by an organisation that is unlikely to collaborate against the user with 
the other organisations (e.g., a privacy office, the interior ministry, and the justice 
ministry). The cryptographic protocol that replaces the TTP guarantees that as long 
as one of these multiple machines is uncompromised and operates correctly, the 
other machines cannot infringe the user’s privacy. 
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Oblivious Anonymity Revocation 


The other approach that we describe here is to design the protocol in such a way 
that the TTP is as oblivious as possible to the task it performs, e.g., it does not 
know which user’s identity it helps to reveal: in our implementation the identity of 
the user would be protected by two layers of encryption. The revocation authority 
can only remove the outer layer of encryption. The second layer is removed by the 
service provider himself once he receives the partial decryption from the revocation 
authority. 

This Oblivious Trusted Third Parties (OTTP) mechanism helps to achieve some 
amount of graceful degradation. Even if the revocation authority is compromised, it 
cannot learn any useful information. Here we assume that there is no collaboration 
between the service provider and the revocation authority. 

Another aspect in which the revocation authority can be made oblivious is in 
terms of the information it receives from the service provider. We want to make 
sure that the original ciphertexts are labeled with the revocation condition but are 
otherwise only known to the service provider, i.e., they look random to all possible 
collusions between users and the revocation authority. This guarantees that powerful 
users with special interests have no way of influencing the revocation authority to 
selectively open only some of the opening requests. 

In contrast to the fully trusted third party as discussed above, this scheme al- 
leviates the trust assumptions on the TTP, and provides both stronger privacy and 
stronger accountability. The OTTP revocation authority is a weaker TTP, whose 
only trust requirement is to revoke the anonymity of users only in those situations 
in which the revocation condition indeed holds. To achieve this, the scheme restricts 
the revocation authority to only process blinded information, unknown to users, and 
to output blinded information that can only be decrypted by the service provider. 

As a result, RA cannot block requests of SP selectively and cannot collude 
against any specific user, nor can it link the transactions of users in the system. Fur- 
thermore, a compromised authority remains restricted in the information it could 
possibly gather, i.e., it can only gather information if the service provider of a par- 
ticular transaction consents to remove the remaining blinding. 

Essentially, oblivious anonymity revocation resolves most of our concerns stated 
above. Nevertheless, in many scenarios, the cost of proving that a request for 
anonymity revocation is legitimate is not proportional to the compensation that the 
service provider gets. 

A simple example is the following: to use a service, an anonymous user has to pay 
a small fee within 30 days. If the user, however, fails to do this, the service provider 
has to prove the non-payment towards the revocation authority in order to obtain the 
user’s identity and take action. Distributing the revocation authority across multi- 
ple machines owned by different organisations does not solve this problem; on the 
contrary, now all of these organisations have to check non-payment, which further 
increases the costs for the service provider. 
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Oblivious Satisfaction Authority 


In scenarios similar to the aforementioned example, it is often easier for the user 
to prove satisfaction, than for the service provider to do the opposite. Therefore, 
in our new approach, we shift some responsibilities from the service provider to 
the user. Instead of the service provider having to prove to the revocation authority 
that the revocation conditions have been met, it is the user’s responsibility to prove 
that the satisfaction conditions have been fulfilled. This change facilitates a far less 
complicated resolution of disputes and conflicts, which is both more economical for 
the service provider and more privacy preserving for the user. 

The approach is as follows: upon the user’s request, an Oblivious Satisfaction Au- 
thority (SA) verifies the satisfaction of a specific condition with respect to a specific 
service, and provides the user with a satisfaction token. The satisfaction authority 
can be made oblivious in the sense that the SA must not be able to link a user’s sat- 
isfaction transaction with the user’s transaction at the service provider. Moreover, 
even if the oblivious satisfaction authority and the oblivious revocation authority 
collude, they should not be able to link satisfaction requests with opening requests. 
This is achieved in a similar way as for the oblivious RA: the satisfaction token is in 
fact double encrypted, and the satisfaction authority is only able to remove the outer 
layer, while only the user is able to remove the final blinding. 

After unblinding the satisfaction token received from SA, the user publishes this 
token, proving satisfaction towards the revocation authority. In other words, when 
the service provider requests the user’s identity, he has to provide the same satisfac- 
tion token to the revocation authority. Now, the revocation authority only discloses 
the (blinded) identity to the service provider if the corresponding satisfaction token 
has not been published before some predefined date. If the user, however, decides 
not to fulfill the contract, and as such cannot publish the corresponding satisfaction 
tokens, the revocation authority discloses the user’s identity to the service provider. 

Since the satisfaction tokens can be machine verified, the involvement of the re- 
vocation authority can be reduced significantly and expensive external authorities 
such as law enforcement become obsolete. This combined approach with oblivious 
revocation and oblivious satisfaction authorities, better serves the needs of service 
providers, as it keeps the process of revocation and the dependency on external revo- 
cation authorities to a minimum. Furthermore, it provides better privacy guarantees 
towards the user than the solution with a fully trusted revocation authority. 

To achieve this, the scheme restricts the revocation authority to only process 
blinded information, unknown to service providers, and to output blinded informa- 
tion that can only be decrypted by the user. As a result, SA cannot block requests of 
U selectively even when under pressure by the service provider, and it cannot col- 
lude against any specific user, nor can it link the transactions of users in the system. 

These strong guarantees do not only protect the user, but they also simplify pri- 
vacy friendly transactions. In particular, we can implement a form of anonymous 
payment based on credit cards rather than anonymous e-cash. When satisfying the 
payment condition towards the satisfaction authority, the user is identified (through 
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his credit card number); however, because of the unlinkability guarantee, his trans- 
action with the service provider remains anonymous. 


5.8 Conclusion 


Even though a large number of very advanced privacy-enhancing cryptographic 
primitives have been proposed in the literature, their way to broad-scale deployment 
in the real world still presents a number of challenges. 

One is the design of user interfaces that capture the core concepts of the under- 
lying cryptography, while hiding the details. 

Another challenge is the integration of the cryptographic primitives in the over- 
all (authentication and access control) infrastructure. For instance, to deploy anony- 
mous credentials, one needs proper policy languages to express and communicate 
the access control requirements in a way that supports, e.g., selective revealing of 
attributes, or proving properties of attributes. Too often do such languages implic- 
itly assume that the user reveals all of her attributes by default. Moreover, since 
credential attributes are often sensitive information, these policy languages have to 
be integrated with privacy policy languages in which servers can express how the 
revealed information will be treated, and for users to express to whom and under 
which circumstances they are willing to reveal it. Privacy policy languages such as 
P3P [W3C06] are a first step, but are often not fine-grained enough, and lack the 
tight integration with access control policies. These and other challenges are cur- 
rently being addressed as part of the PrimeLife project [pria, CMN* 10]. 

From a cryptographic perspective, there are still many open problems to be ad- 
dressed. Researchers are searching for more efficient primitives, since the incurred 
overhead is still prohibitive in many applications. Also, dedicated protocols for ad- 
vanced applications like social networks or location-based services would be de- 
sirable. From a theoretical point of view, an important challenge is how existing 
primitives can be securely and efficiently composed to build new, more complex 
primitives. Finally, most of the above primitives currently still lack proper key man- 
agement infrastructures so that keys can be securely stored, authenticated, and re- 
voked. 


Chapter 6 
Transparency Tools 


Hans Hedbom, Tobias Pulls, and Marit Hansen 


Abstract The increasing spread of personal information on the Internet calls for 
new tools and paradigm to complement the concealment and protection paradigms. 
One such suggested paradigm is transparency and the associated transparency en- 
hancing tools, making it possible for Data Subjects to track an examine how there 
data have been used, where it originates and what personal data about them that 
Data Controllers have stored. One such tool needed in order to track events related 
to personal data is a log system. Such a log system must be constructed in such a 
way that it does not introduce new privacy problems. This chapter describes such a 
log system that we call a privacy preserving secure log. It outlines the requirements 
for the system and describes and specifies a privacy preserving log system that has 
been developed and implemented within the PrimeLife project. 


6.1 Introduction 


As users of information technology, we are today giving away more and more per- 
sonal information to many different actors. The personal information we expose can 
easily be transferred to third parties or harvested from public sources without the 
control (or consent) of the Data Subject. This situation, in combination with the 
fact that emerging technologies such as sensor networks and ambient intelligence 
environments, makes it even harder to control when or what information is col- 
lected, and has made several researchers argue that the paradigm of concealment 
and data minimization is no longer enough to protect or limit the exposure and re- 
lease of personal data [Hil09, SSA06, WABL* 06]. The solution that is suggested in 
[Hil09, SSA06, WABL* 06] is to increase the transparency of how data is collected, 
stored and used in order to make it possible for the Data Subject to make informed 
decisions and to monitor the usage of her data and that agreed policies are honoured. 

There are also legal provisions for transparency. In the European data protection 
framework, several provisions exist that support or demand transparency. Among 
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others, Art. 12 Data Protection Directive 95/46/EC specifies the legal provision that 
grants every person the right to access, i.e., the right to obtain from the controller, 
a confirmation as to whether data relating to him are being processed and informa- 
tion about the purposes of the processing, the categories of data concerned, and the 
recipients or categories of recipients to whom the data are disclosed. This is the ba- 
sic provision for exercising other Data Subject rights such as rectification, erasure, 
or blocking of data (cf. Art. 12). Further, in any automatic processing of data con- 
cerning him, at least in the case of the automated decisions, Data Controllers should 
grant every Data Subject the knowledge of the logic involved. In addition to the Data 
Protection Directive 95/46/EC, other regulations may demand further information 
or notification processes. Outstanding is the personal data breach notification that 
is laid down in Art. 4 No 3 e-Privacy Directive 2009/136/EC: “In the case of a 
personal data breach, the provider of publicly available electronic communications 
services shall, without undue delay, notify the personal data breach to the compe- 
tent national authority. When the personal data breach is likely to adversely affect 
the personal data or privacy of a subscriber or individual, the provider shall also 
notify the subscriber or individual of the breach without undue delay.” Summarising 
the mentioned provisions, the need for transparency for Data Subjects is apparent. 
This requires that the Data Controller has transparency of the data processing, which 
should be taken for granted, but is often not the case. 

The technical and legal tools needed to achieve this type of transparency have 
been named “Transparency Enhancing Tools (TETs)” [Hil09]. There are some dif- 
ferent definitions of what a TET is, however, our understanding of what constitutes 
a technical TET is the following, based on a definition in [Hed09] : A TET is a 
technical tool that accomplishes one or more of the following: 


1. It provides information about the intended collection, storage and/or data pro- 
cessing to the Data Subject, or a proxy acting on behalf of the Data Subject, in 
order to enhance the Data Subject’s privacy; 

2. It provides the Data Subject with an overview of what personal data have been 
disclosed to which Data Controller under which policies; 

3. It provides the Data Subject, or her proxy, online access to her personal data, to 
information on how her data have been processed and whether this was in line 
with privacy laws and/or negotiated policies, and/or to the logic of data process- 
ing in order to enhance the Data Subject’s privacy; 

4. It provides “counter profiling” capabilities to the Data Subject, or her proxy, 
helping her to “guess” how her data match relevant group profiles, which may 
affect her future opportunities or risks. 


Within PrimeLife, several transparency tools have been developed covering points 1 
and 2 in the list above and the data access part of point 3. The Data Track described 
in Chapter 3, for example, has all of these characteristics and the Privacy Dashboard 
has elements of 1 and 2 in it. However, since these tools are presented elsewhere in 
this book, we will concentrate on the processing part of point 3 in the definition in 
this section. 
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In the following, Section 6.2 will give a short example to set the scene. Sec- 
tion 6.3 will introduce the notion of Privacy Preserving Secure logging and outline 
and motivate the requirements of such a log system. Section 6.4 will describe pre- 
vious work within the area, while Section 6.5 gives a technical overview of the log 
system. Finally, Section 6.6 concludes and gives a small outlook on future work. 


6.2 Setting the Scene 


So what good is transparency of processing for a user? Let us answer that ques- 
tion by giving a small example case using the Data Subject Alice. Assume in the 
following that Alice has an application capable of accessing and displaying in a 
user friendly manner the events that has happened to her data at different Data Con- 
trollers. This could, for example, be an added functionality of the Data Track. 

Alice has an e-mail address that she is very restrictive in using for anything but 
personal communication with friends and family. Suddenly a lot of unsolicited e- 
mails (spam) is dropping into the mailbox of this address. Alice gets a bit annoyed 
and wonders what has happened. In order to try to find out, she fires up her Data 
Track and performs a search of released data giving her e-mail address as the search 
key. The search returns showing that she accidentally used this email address on 
two occasions: once when ordering a book at bad books co and once when she 
inquired about a flight at Cheep Airlines Ltd. Alice then asks the application to 
access these two services and retrieve the events that are related to her data. She then 
searches the events finding the ones that relates to the e-mail address. The result of 
this search turns up a record at bad books co showing that the e-mail address has 
been forwarded to five third party companies, all of which are the ones that have 
been sending unsolicited e-mails to her. Since the application also stores agreed 
policies, it informs her that this transfer is a violation of the agreement. Alice now 
knows that the address was leaked by bad books co ; this was a policy violation. 
With this information, she can take proper action against bad books co and ask for 
her data to be removed there and at the third party sites. 

This is of course a limited and benign example, but similar actions could be taken 
to find out general policy violations, the reason for an automated decision or who 
and why personal data have been accessed. For more discussions on the benefits of 
transparency and examples see, e.g., [Hil09, SSA06, WABL* 06]. 

The rest of this chapter will concentrate on one of the technologies behind the 
scene needed to implement process transparency in a privacy friendly manner, Pri- 
vacy Preserving Secure logging. 
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6.3 On Privacy Preserving and Secure Logs 


In order to know how personal data have been processed and used, there needs to 
be a way of keeping track of events related to the data in the system, e.g., when 
and how data have been accessed and by who, when or what process and what was 
the reason for that access. The most common way of keeping track of events in a 
computer system is to keep a number of logs storing events of a specific type or 
related to a specific aspect of the system, e.g., security logs store security related 
events and application logs store relevant events connected to a specific application. 
As a consequence, we argue that there needs to be detailed logging of how personal 
data have been processed and used by the data controller on behalf of the Data 
Subjects whose personal data are being processed. However, the privacy preserving 
secure log is supposed to be beneficial to the privacy of the Data Subjects whose 
data are being processed: it should not become a privacy problem in and of itself. 
Thus, this log needs to have special properties. In general, a log can be viewed as a 
record of sequential data. A secure log, when compared to a standard log, protects 
the confidentiality and integrity of the entries in the log. A privacy preserving secure 
log, in addition to being a secure log, tries to address the privacy problems related to 
the fact that you are keeping a log of how personal data are processed and used for 
the sake of transparency. Each entry in the privacy preserving secure log concerns 
one Data Subject, the entity on whose behalf the entry is made. The log is secure in 
the sense that confidentiality is provided by encrypting the data stored in entries and 
integrity is provided by using hashes and message authentication codes. The log is 
privacy preserving by providing the following properties: 


1. The data in a log entry can only be read by the Data Subject to whom the entry 
relates. This ensures that no other entity can read the data stored in the log, which 
could violate the privacy of the data subject. 

2. Data Subjects and the Data Controller can independently validate the integrity 
of the entries that concerns them. If multiple entities are needed to validate the 
integrity of parts of the log, no single entity will fully trust in the integrity of, and 
hence the data stored in, entries of the log.! 

3. Unlinkability of Data Subject’s entries; that is, you cannot tell to which Data 
Subject an entry relates. Without this property, it would be possible to tell how 
many entries there are in the log for each Data Subject, which might reveal, for 
example, how frequent a Data Subject is using a service. 

4. The Data Controller safely provides anonymous access to the log entries for Data 
Subjects. Requiring authenticated access to entries could allow an attacker to link 
entries to Data Subjects. 


' The Data Controller can validate the integrity of all the entries in the log without knowledge of 
the contents. 


2 This is of course highly dependent on what is stored in the log. 
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6.3.1 Attacker Model and Security Evaluation 


Our privacy preserving secure log provides the properties described previously for 
all entries committed to the log prior to an attacker compromising the Data Con- 
trollers system running the logging system. Once compromised, little can be done 
to secure or provide any privacy to future entries committed to the log. The attacker 
model is thus that the Data Controller is initially trusted and then at some point in 
time becomes compromised. Even if a large portion of the Data Subjects become 
compromised, we show in [HPHL10] that the properties of the log hold until the 
point in time when the Data Controller becomes compromised. 

We present some arguments throughout the technical overview in section 6.5 
as to why the properties in question hold for our log. A more complete security 
evaluation of our privacy preserving secure log can be found in [HPHL10]. For 
Data Subjects to learn when a Data Controller has been compromised, we assume 
that there exists some mechanism, such as software working with a TPM (such 
as the PRIME core) or regular audits by a trusted third party. Depending on the 
anonymity service used by Data Subjects to access the API of the log, there might 
be attacks that allow an attacker to link the downloaded entries to the Data Subject. 
One example would be the use of a low-latency anonymity network such as Tor and 
an attacker doing end-to-end correlation. 


6.4 Prior Work and Our Contribution 


There exists a number of solutions on secure logging in the literature. The most 
relevant is the Schneier-Kelsey secure log [SK98], which was used as a foundation 
for our privacy preserving secure log, but also work done by Holt [Hol06], Ma et 
al. [MT07, MT08] and Accorsi [Acc08]. In general, these solutions provide con- 
fidentiality and integrity of log entries committed to the log prior to an attacker 
compromising the logging system. This is accomplished by depriving an attacker 
of access to one or more cryptographic keys used when creating the entries com- 
mitted to the log, either by destroying old keys or using public key cryptography. 
Neither of the solutions mentioned fully addresses the problem of unlinkability and 
anonymous access. Some work of unlinkability in connection with logs has been 
addressed by Wouters et al. [WSLP08]. However, this work primarily addresses the 
unlinkability of logs between logging systems in an eGovernment setting rather than 
unlinkability of log entries within a log. Further, they do not address the problem of 
an inside attacker or provide anonymous access to log entries. Our main contribu- 
tions are in the area of unlinkability and the ability to safely provide anonymous ac- 
cess to log entries. For details concerning our contributions and ongoing work, see 
[HPHL10, EP10, HH10]. Further work is being done on developing a distributed 
version of the log, allowing transparency logging to continue when data about a 


3 See https: //blog.torproject .org/blog/one-cell-enough 
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Data Subject is shared from one Data Controller to another in a privacy-friendly 
manner. 


6.5 Technical Overview 


Conceptually, the privacy preserving secure log, hereafter referred to as simply “the 
log,” can be divided into four parts; state, entry structure, storage and the API. With 
the help of these four parts, we will explain how the properties outlined in the pre- 
vious section are accomplished by the log. 


6.5.1 State and Secrets 


As stated earlier, each entry in the log relates to a Data Subject. When an entry is 
added to the log, both the data to be logged and the identifier*+of the Data Subject 
for whom the entry is created has to be provided. To be able to log data for a Data 
Subject, the Data Subject’s identifier must have been initialised in the log’s state. 
Formally speaking, the state maps (entity, attribute name) to an attribute value. 

When a new Data Subject is initialised in state, a unique identifier and three 
values, the Data Subject’s secret, seed and public key, are stored in the log’s state. 
These three values are provided by the Data Subject and only the Data Subject 
knows his private key. The secret and seed are large random values. For each Data 
Subject identifier, the state keeps track of the following attributes: 


e AK - The authentication key for the next entry in the log. The initial value is 
derived from the Data Subject’s secret. 

e PK- The public key. 
ID - The value of the identifier for the previous entry in the log for the Data 
Subject in an obfuscated form. The initial value, since for the first entry there is 
no previous entry, is the seed provided by the Data Subject. 

e Chain - The value of the chain for the previous entry in the log for the Data 
Subject in an obfuscated form. There is no initial value for this attribute. 


The log’s state evolves as new entries are added to the log. The old values stored in 
state, if overwritten as part of the state update procedure”, are irrevocably deleted 
from the Data Controller. This is accomplished by cryptographic hashes and mes- 
sage authentication codes, with the authentication key for the entry as part of either 
the data being hashed or as a key. This is the main procedure that leads to the “prior 
to” property described earlier; when the attacker compromises the Data Controller, 


4 The only requirement from the log is that all Data Subject identifiers are unique. A user could be 
known under any number of different data subject identifiers. 


> Details can be found in “Adding Secure Transparency Logging to the Prime Core” [HPHL10]. 
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the information needed to compromise the entries already committed to the log is 
missing or computationally hard to generate. 

The Data Controller, like all Data Subjects, has an entry in the log’s state with 
one additional attribute: a signing key used by the logging system for signing the 
data stored in entries. The initial secret and seed of the Data Controller’s system 
should not be stored on the Data Controller to ensure that the “prior to” property 
holds true for the Data Controller as well. 


6.5.2 Entry Structure and Storage 


A log entry consists of five fields: two identifiers, two chains and the data field. 
The Data Controller and the Data Subject have an identifier and chain each, see 
Figure 6.1. 


Fig. 6.1: The five fields of an entry in the log. 


The purposes of the different fields are: 


e The identifier field contains a unique identifier that only the entity that the field 
belongs to can generate by having knowledge of the authentication key used to 
generate it. This is a hash. 

e The chain field provides cumulative verification of the integrity of the entries 
in the log; either all entries in the log, for the Data Controller with help of the 
controller chain, or all the entries that belong to a specific Data Subject with 
help of the subject chain. This is the field that allows for independent integrity 
validation by each entity. This is a message authentication code. 

e The encrypted data field provides data confidentiality by encrypting the data 
with the public key of the Data Subject, ensuring that only the Data Subject can 
read the data. 


Log entries are stored in the log’s storage. Storage is a multiset or bag, where entries 
are stored without any order. Entries can be retrieved from storage based on the value 
of the identifier field for the Data Controller or Data Subject. 
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6.5.3 API 


The log provides an unauthenticated stateless API for Data Subjects to access entries 
stored in the log. It can be accessed anonymously by Data Subjects if they use an 
anonymising service such as Tor. The following two methods are exposed: 


e GetEntry(identifier) - returns the entry with the given entry subject identifier 
from storage. If no entry is found, a dummy entry is generated and returned 
instead. 

e GetLatestEntryID(identifier)° - returns the identifier stored in the log’s state 
for the Data Subject with the given identifier. 


Providing anonymous access to entries is safe because of how the entries are struc- 
tured. An attacker lacking knowledge of the corresponding private key that decrypts 
an entry cannot read the entry’s contents. An attacker without knowledge of the au- 
thentication key used to generate any identifier or chain cannot use the values to link 
any entries together. 


6.5.4 Unlinkability 


Unlinkability between log entries and Data Subjects for entries committed to the log 
prior to an attacker compromising the Data Controllers system is provided because: 


e The data field is encrypted with the Data Subject’s public key using a KEM- 
DEM [ISO] hybrid cipher using probabilistic encryption schemes with key- 
privacy[BBDPO1]. This means that an attacker cannot learn which public key, 
out of all the public keys in the system, was used to encrypt the data by inspect- 
ing the encrypted data or by encrypting common log data and comparing the 
results. 

e The identifier and chain fields are created using ideal hash and message authen- 
tication code algorithms respectively, where either as part of the data or as a key, 
an authentication key is used that is no longer known to an attacker. 

e The entries are stored in a multiset, that is, they are not in chronological order. 
This is important to prevent correlation attacks, using other logs in the system 
such as the Apache access log. We address this issue for the implementation 
of our log in [HH10], where we present a solution that destroys the recorded 
chronological order of entries inserted into relational databases with minimal 
affect on average insert time for inserting entries into the log. 


6 This is a method that is not strictly needed: the Data Subject could query the GetEntry-method 
until an invalid entry is returned, but its use makes some attacks on the unlinkability property of the 
log harder for an attacker. Further, the returned value is encrypted using probabilistic encryption to 
prevent an attacker from determining when a new entry is added to the log for a Data Subject. See 
[EP10, HH10] for more information. 
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6.6 Conclusion and Outlook 


Within PrimeLife, several tools that have the aspects of transparency enhancement 
have been developed, however, in this section we have discussed a tool to help the 
Data Subject (or an external auditor) determine how her personal data have been 
used by the Data Controller. A privacy preserving secure log enables a Data Con- 
troller to store log messages concerning how a Data Subject’s personal data is pro- 
cessed without the log in and of itself becoming a privacy problem. We have dis- 
cussed the requirements needed for such a log and by implementing it within the 
PrimeLife project shown that it is indeed possible to create a practical log system 
that fulfils all the requirements. The log itself is currently implemented as a module 
and can be integrated, fulfilling certain conditions, in a system under development. 
However, it is not that useful for a normal user unless it can be accessed easily and 
the information it contains can be shown in a user friendly manner. Because of this, 
we are now concentrating our efforts on providing a user interface that can present 
the log records in a way that is understandable for a user and that also automati- 
cally retrieves and verifies the log in a privacy-friendly manner. This user interface, 
when finished, could for example be integrated in the Data Track in order to create 
an application similar to the one Alice is using in the example. We would also like 
to investigate what events need to be logged in order to devise a method for estab- 
lishing an optimal log strategy, trying to find a balance between completeness and 
efficiency. 


Chapter 7 
Interoperability of Trust and Reputation Tools 


Sandra Steinbrecher and Stefan Schiffner 


Abstract Reputation systems naturally collect information on who interacts with 
whom and how satisfied the interaction partners are about the outcome of the in- 
teractions. Opinion of and about natural persons are personal data and need to be 
protected. We elaborate requirements for electronic reputation systems. We focus on 
security properties, that is if a system is secure an attacker can not forge reputation 
values, further we elaborate privacy protection goals. A short literature survey on 
system implementations is given. We discuss interoperability of different reputation 
providers, that is how can a reputation be transported from one to an other reputation 
provider. Finally, we show how reputation systems should be integrated in identity 
management systems. 


7.1 Introduction 


Privacy enhancements constrain trust establishment in social software, since trust 
is usually based on information about the user’s past behaviour observed by other 
users. Privacy enhancing technologies follow data minimisation and control paradigms, 
hence the access to user-related information is restricted by design. However, some 
information is needed to base trust upon. Crypto mechanisms, on one hand, allow 
for legal enforceability of (pseudonymous) interaction partners to prevent them from 
misbehaving. Many interactions, on the other hand, are informal, or it is too expen- 
sive to enforce liability. For these reasons, social software often deploys reputation 
systems as additional mechanisms for establishing trust in user interactions. Repu- 
tation objects are not only users, but products, web content and services; or more 
generally, anything users depend on for their goals. In Chapter 3 of this book we 
already outlined and discussed the scenario of web content. In this section, we gen- 
eralise this scenario a more comprehensive one and outline the social needs (Sec- 
tion 7.2), the legal aspects (Section 7.3), the resulting requirements (Section 7.4), 
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and finally the technical implementability of reputation management (Section 7.5). 
We focus on its interoperability in (Section 7.6). 


7.2 Social need 


People maintain relations and rely on them in various ways. Who did not already 
book a hotel that a colleague has been to before? Who never read a book a friend 
recommended? Who does not sometimes go to the same doctor as his pals? When 
using the Internet, many users also rely on these relations and make use of Web 2.0 
applications to make these links explicit. Those social networking software allows 
users to maintain a list of buddies/friends, share information with them and explore 
the social structure of their peer group. 

With the growing usage of the Internet, the social network of humans covers the 
Internet as well. Hence, there is an interest of users in transferring the trust other 
human beings place in them in the offline world to the Internet. Also the respective 
applications are interested in getting information about their users from the offline 
world. In addition, people make links to other Internet users they did/do not know 
personally. These links can be made up explicitly by direct interaction, e.g., dis- 
cussions in Internet communities; or the links might have made up implicitly by 
Internet applications that link users, e.g., according to common interests. 

Users may not be totally aware of these links, e.g., when they are recommended 
books in an Internet shop. However, these links have a high potential to be used for 
trust decisions as in the offline world. In fact, from hotel ratings (e.g., TripAdvisor!) 
and book reviews (e.g., Amazon) to medical advice, all phsical world examples 
from above have their electronic counterpart and are present in many applications. 
Thus users desire to transfer the trust (the users of) one application place(s) in them 
also to other applications and their users. Also, the respective applications are in- 
terested in getting information about users from other applications to make up their 
own trust decision as to how to treat this user. 

Social scientists and theoretical economists model the problem of whether two 
users who want to interact, should place trust in each other as trust games [CW88, 
Das00] that need inter-personal context-specific trust. In this context, the term so- 
cial networks often refers to a modeling way of social individuals’ relations. We 
represent those by a graph that has a vertex as representation for each individual 
and directed and labeled edges for relations. The links describe a possible relation 
the human beings have to each other, e.g., friendship, common interests, dislike. 

A reputation network is a social network that links users (possibly pseudony- 
mously) to each other and allows them to interact and exchange information with 
and about each other. For the concept of reputation, we assume that a user’s past can 
predict his future behaviour. The creation of this prediction is called learning. Users 


lhttp://www.tripadvisor.com/ (last visited Jan. 2011) 
2nttp://www.amazon.com/ (last visited Jan. 2011) 
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within the reputation network can learn an object’s reputation from other users in 
the network who share experiences with the reputation object. In social sciences, 
this is called the learning mechanism of the reputation network [BRO1]. 

If users in the reputation network are reputation objects (e.g., as seller in a mar- 
ketplace or author in a wiki) or can influence reputation objects (e.g., as author for 
the content created being the reputation object), users may control others in the 
reputation network by spreading information about the respective user. This will 
hopefully influence the user’s behaviour positively. If the (behaviour of the) reputa- 
tion object cannot be changed by the users it should at least be neglected by them. 
In social sciences, this is called the control mechanism of the reputation network 
[BRO1]. 

Reputation systems assist reputation networks technically. To implement both the 
learning and the control mechanism of the reputation network, a reputation system 
has to offer the following functions to its users [Ste09]: 


Learning mechanism and evaluation function: The reputation system provides users 
with an evaluation function to learn a reputation object’s reputation following 
specific rules. Possibly, every evaluator might receive a different reputation of 
the reputation object. The selection of ratings used by the evaluation function 
depends on both the information flow of ratings in the reputation network and the 
trust structure in the reputation network, i.e., how users trust in others’ ratings. 

Control mechanism with a rating function: Control is exercised after an interac- 
tion takes place. It gives feedback to the user and these ratings are kept in a 
history for later evaluations. There are two types of users who can make use of 
the control mechanism: the interaction partner in the form of interaction-derived 
reputation and possible observers in the form of observed reputation [Mui03]. 
The reputation system provides authorised raters with a rating function that al- 
lows them to map reputation objects to ratings. The reputation system updates 
the reputation of the reputation object from the ratings received with a reputation 
function. 


7.3 Legal Aspect 


Reputation systems can be seen as databases that collect information about who in- 
teracted with whom in which context and the respective persons’ opinion about each 
other based on this interaction. According to Bygrave [Byg02], opinions about a nat- 
ural person are personal data, so that the respective person’s right on informational 
self-determination is applicable. Therefore, explicit reputation should only be accu- 
mulated about users who agreed to accumulation. This reputation should only be 
shown to others after users give their consent. Furthermore, reputation information 
should be protected by means of technical data protection, as outlined by Mahler 
and Olsen [MO04]. The usage of pseudonyms can help here but if the granularity of 
the reputation is too fine, the reputation itself becomes a quasi-identifier that allows 
pseudonymous profile building. 
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7.4 Security and Privacy Requirements 


As for many other technical systems, security and privacy requirements were not a 
major issue when the first reputation systems were designed and established. With 
their wider application, a growing number of reputation systems are subject to vari- 
ous attacks as outlined, e.g., in an ENISA position paper [ENIO7]: Thus security and 
privacy requirements of reputation systems has been studied. This chapter is mainly 
based on [Ste09, ENIO7, Vos04, SCS10]. However, these papers are written from 
different perspectives, while we focus on the technical functionalities of a reputa- 
tion system as identified in Section 7.2. Those lead to the following building block 
requirements: 


Rating function: 


e Accountability of ratings: Users want raters to be accountable for the ratings 
they give for reputation objects. 
e Raters’ anonymity: Users want to rate anonymously. 


Reputation function: The reputation system updates the reputation object’s repu- 
tation from the ratings received. The rating function follows specific rules fixed 
by the system designer. These rules typically depend on the application scenario 
and have to fulfill sociological and economic requirements. However, the follow- 
ing requirements should hold: 


e Completeness of reputation: Users want the aggregated reputation to consider 
all ratings given and that are available to him according to the information 
flow in the reputation network. 

e Liveliness of reputation: Reputation should always consider all recent interac- 
tions or give users an indication that there are no more. Especially users who 
are reputation objects or can influence reputation objects should not have the 
possibility to reach a final state in which bad behaviour no longer damages 
the respective reputation. 


Evaluation function: The aggregated reputation of a reputation object can be 
shown to other users on request. Therefore, the following requirements apply: 


e Availability of reputation: All users in the reputation network need to be able 
to access a reputation object’s reputation; however, if the reputation object is 
a user, this might require his consent from a legal perspective. 

e Evaluator’s anonymity: Users want to evaluate reputation anonymously to 
prevent others from building personal behaviour profiles of their interests. 

e Possibly reputation object’s anonymity: If the reputation object is a user, he 
might not want to be linked to his past interactions (except that these con- 
tributed to his reputation) to prevent others from building profiles about all 
his interactions and interaction partners. 

e Unlinkability of reputation objects: If reputation objects have some relation to 
each other, this should not be revealed. 
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e Persistence of reputation objects: Users’ reputations need to be persistent and 
enduring [RKZFO0]. While the first property ensures that re-entering with a 
neutral reputation is hard, the latter ensures that a user has a long history to 
learn from. 

e Individual scalability of reputation: Users want to be able to decide on the 
delivered reputation depending on the trust structure the reputation network 
has for them. 


By actions in the reputation network, no other requirements on interactions in the 
reputation network should be affected. This calls for unlinkability of actions for the 
same user as well as for his anonymity when doing something. 


7.5 Technical Implementability 


Reputation systems have been widely studied in social sciences, economics and 
computer science. Special attention has been paid to the possible design of reputa- 
tion system architectures and reputation functions, i.e how to calculate a reputation 
from given ratings. An overview of architectures is for example provided by Voss 
in [Vos04], while possible reputation functions are, for example, outlined by Mui in 
[Mui03]. For an economic introduction, we refer to Dellarocas’ work [Del03]. 

It is quite clear that it is difficult or even impossible to design a reputation system 
that fulfills all security requirements. However, there exist a number of approaches 
that try to fulfill at least a significant subset. As the focus of our work is to outline 
how reputation systems can be applied to the scenario of privacy-enhanced social 
software and privacy-enhancing identity management, we concentrate on the fol- 
lowing privacy requirements: anonymity of raters, evaluators, and reputation objects 
as well as unlinkability of reputation objects. 

In [PRT04], using anonymity services to achieve privacy for reputation sys- 
tems is proposed. However, this approach is inadequate since it only protects the 
evaluators. In order to obtain anonymity of raters and reputation objects, it needs 
to be ensured that many users are indistinguishable by an attacker, so that they 
are in large anonymity sets. For unlinkability of reputation objects, others should 
not be able to link interactions with the same user. The possibility of recognising 
users by reputation is limited if the set of possible reputations is limited [Ste09] 
or if the reputation is only published as an estimated reputation, as proposed in 
[Del00]. Transaction pseudonyms can be used to avoid linkability between transac- 
tions [ACBM08, Ste06]. In order to obtain anonymity of raters, interactions and 
ratings related to these interactions need to be unlinkable. This can be reached 
by a reputation provider who only calculates a new user reputation after it col- 
lected not only one but several ratings [Del06], or who only publishes an estima- 
tion of the actual reputation [Del00]. Further, a rater can be anonymous against 
the reputation provider by using convertible credentials [Ste09] or electronic cash 
[ACBM08, SCS10]. Furthermore, in [Ker09] a provable secure reputation system. 
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This system uses 2 TTP that ensure that ratings and reputations are unlinkable. How- 
ever, it does not provide anonymity for the interaction partners, since the authors ar- 
gue that this would be useless: interaction in the physical world requires addresses 
anyway for good delivery and money transfer. This is the reason why in [SCS11] a 
system based on DC-nets is proposed that provides privacy in the form of informa- 
tion theoretic relationship anonymity with regards to users. 

It was a focus of Prime Life to contribute to the possible design options of 
privacy-respecting reputation systems as you can see from the publications already 
cited above [Ste09, SCS10, SCS11]. 


7.6 Infrastructure 


For all functions of reputation systems, namely rating, reputation and evaluation 
function, infrastructure needs arise. This means, a reputation system either has to be 
integrated in other systems or be closely interoperable with them. In the following, 
we discuss interoperability for both options, as we already published as a result of 
PrimeLife in [Ste10]. 


7.6.1 Interoperability with Applications 


Currently the vision arises to establish stand-alone reputation systems that collect 
information from various interactions in different applications. 

According to the social needs of reputation systems outlined in Section 7.2, the 
applications, where the interactions rated took place, have to provide the reputation 
system with as much information as possible on the following aspects: 


e Model of Trust Game: Only users who gave a leap of faith to reputation objects 
should be able to rate them. Applications have to make a clear model, who gave 
a leap of faith, and specify this for the reputation system. 

e Interaction information: As reputation is context-dependent, information on the 
interaction rated is needed, e.g., time, value for the interaction partners. 

e Rater information: As reputation needs to build on inter-personal trust, informa- 
tion on the raters is also needed, as will be outlined in Section 7.6.2 


As there already exist a number of reputation systems integrated in applications, 
making these reputation system(s) interoperable becomes of interest. The problem 
of interoperability that is represented by the reputation exchange function in our 
model is twofold: 


e Format: Firstly, formats for common exchange and possibly also internal rep- 
resentation of reputation are needed. An OASIS group? working on a possible 


3 http://www.oasis-open.org/committees/tc\_home.php?wg\_abbrev\ 
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portable format using XML. However, we still currently lack such a standard that 
could be implemented. Here is the need for solutions that can be easily integrated 
in the existing web technologies. The suggestion we implemented for PrimeLife 
in [SGMO09] was to use the Resource Description Framework (RDF) common for 
the Web 2.0 and allowing for adding reputation information as meta-information 
to arbitrary web content. 

e Algorithm: In every reputation system, different implementations of rating func- 
tion, evaluation function and reputation function are defined depending on the 
system design. An algorithm for transfering reputations received from another 
reputation systems to the own reputation system is needed that balances possi- 
ble advantages and disadvantages. This algorithm needs to comprise inheritance 
rules for reputations to decide on interoperability of reputation or ratings from 
different reputation systems. 


The OpenPrivacy Initiative+ presented Sierra, a reference implementation of a 
reputation management framework comprising several components representing the 
functions of the reputation system as well as an identity management system. They 
also define reputation exchange functions, whose actual implementation can be de- 
termined by the system designer in terms of exchange rates between reputations 
calculated from different reputation functions. 

However, there are other issues of interoperability between reputation systems so 
far neglected by the technical literature: 


e Several reputation exchanges: For several executions of the reputation exchange 
functions between two reputation systems, it has to be secured that it is clear 
which part of reputation has already been exchanged. 

e Related reputation objects: So far, we assumed that the reputation object is well 
determined. Another issue of interoperability for reputation systems deals with 
is the possible relation between distinct reputation objects. An example is the 
wiki scenario from Chapter 1: The reputation system of collecting reputation of 
content might need to exchange reputation with the reputation system collecting 
reputation of authors. Certainly there is some relation between a content and its 
authors, but it might not be advisable to transfer reputation of one content directly 
to its authors and vice verse. Thus reputation systems need to define the transfer 
of reputation between related objects by a reputation object exchange function. 


Aside from the OpenPrivacy Initiative, there are commercial stand-alone sys- 
tems such as iKarma,> which is a “third-party service for collecting, managing and 
promoting [your] reputation among [your] customers and contacts” or portals like 
Trivago® that comprises reputation information from various other reputation sys- 
tems. 

The scientific approaches, that outline reputation infrastructures independent 
from concrete applications (e.g., [Vos04, PSO8, KK09, SCS10, SCS11], do not fol- 


4http://openprivacy.org/ (last visited Jan. 2011) 
Shttp://ikarma.com/support/fagq/\¢#1 (last visited Jan. 2011) 
®http://www.trivago.com/ (last visited Jan. 2011) 
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low the centralised approach of the commercial solutions, but use local storage of 
reputation information to enable users to show the reputation they collected to oth- 
ers themselves. All of these suggestions need some external infrastructure to prevent 
reputation manipulation by the reputation object. 

In the mentioned scientific approaches, the trust model is implicitly clear. How- 
ever, as all of them aim for a privacy-respecting reputation system neither interaction 
nor rater information is provided. For the commercial solutions users can provide as 
much information as they want on themselves and their interactions. 


7.6.2 Interoperability with Trust Management 


As outlined in Section 7.2, reputation networks need to have some kind of inherent 
trust structure. When a user wants to determine a reputation object’s credibility, that 
is trustworthiness, he has to determine his trust in two other sources as well: 


e Raters: The ratings given by raters can be: 


— subjective ratings, that are influenced by the raters’ subjective estimation of 
the reputation object, or 

— objective ratings, that can be verified by all other users than the rater at some 
point in time and that would have come to the same ratings. 


An example for the first type of ratings is eBay while examples for the second 
type can be found in P2P systems, e.g., GNUnet’, where the reply to a query 
leads to a positive reputation, and a reply can be proved or verified at least at the 
time it is sent. 
If the raters are humans, subjective ratings will be given. Then the rater needs to 
decide whether he would have come to the same rating; this means their views on 
the reputation object is interoperable. For this reason, a trust management system 
to determine the inter-personal trust in raters is needed. It can be realised by an 
additional reputation system for raters. 

e Reputation systems: Evaluators need to have system trust in all reputation sys- 
tems that collected the ratings and calculated the reputation the user evaluates. 


Technically, trust management is often associated with PKI structures [Mau96] 
(beneath other approaches). PKI structures allow for binding keys to pseudonyms. 
Others can use their key to sign this binding. Thereby chains to other users, who 
want to trust in this binding, can be built. These chains can be constructed hierarchi- 
cally with certification authorities or in the form of a web of trust (e.g., GPG/PGP). 
Both structures can also be used for the broader deployment of reputation systems. 
Hierarchies and chains as they work for trust management can be applied to reputa- 
tion management to express which experiences from others can be trusted. 

However, the straightforward approach to implement ratings as signatures and 
use existing PKI structures only assures accountability of keys and linkage to their 


7 www.gnunet .org (last visited April 2010) 
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holder. However, if a user or certification authority signs someones key in a PKI 
structure, that does not say anything about the credibility/competence they assume 
the key holders to have as reputation objects. For this reason, different key(s) than 
for accountability are needed and existing certificate structures have to be extended 
appropriately. 


7.6.3 Interoperability with Identity Management 


For the evaluation function of reputation systems, not only the overall reputation, but 
also the single ratings and the raters who gave them, might be important. If raters 
misbehave maliciously by giving ratings that do not reflect the concrete experience 
they had with reputation objects, there should be a possibility to detect this and 
probably to make them accountable for it. 

As for the collection of large reputation profiles about users (both reputation ob- 
jects and raters), privacy becomes an important issue. A reputation system should be 
interoperable with privacy-enhancing user-controlled identity management systems 
(PE-IMS). An IMS in general is able to certify users and grant rights to them for 
applications. Additionally a PE-IMS [CPHH02, CKO01] like PRIME® assists users 
platform-independentally in controlling their personal data in various applications 
and selecting pseudonyms appropriately, depending on their wish for pseudonymity 
and unlinkability of actions. 

The interoperability of a reputation system with a PE-IMS needs a privacy- 
respecting design of reputation systems, while keeping the level of trust provided 
by the use of reputations as outlined in [Ste09]. 

When a reputation system interoperates with a PE-IMS, it is possible and in- 
tended that users have several partial identities (pIDs) which cannot be linked, nei- 
ther by other users using the systems nor by the underlying system. Hence, both 
raters and reputation objects are only known by pseudonyms to each other. 

If there would exist only one reputation value per user, all pIDs of this user 
would have the same reputation. This would ease the linking of the pIDs of one 
user because of the same reputation value. Thus, having separated reputations per 
pID and not only one per user is a fundamental condition for a reputation system in 
the context of identity management. 

The use of pIDs raises the problem that a malicious user may rate himself a lot of 
times using new self-created pID for every rating in order to improve his own repu- 
tation. This kind of attack is also known as a Sybil attack [Dou02]. If the reputation 
system is not defined carefully, it would be easy for such an attacker to improve their 
own reputation unwarranted. This can be limited/prevented by entrance fees or the 
use of once-in-a-lifetime credentials as suggested in [FR99]. When using PRIME as 
IMS, the latter can be implemented by its identity provider issuing such credentials. 
Alternatively or additionally, fees could also be collected. 


8 Privacy and Identity Management for Europe (http: //www.prime-project.eu/), 
funded by the European Union in the 6. Framework Program, 2004-2008. 
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7.6.4 Resulting implementation 


For users as reputation objects, we outline in the following paragraphs a possible re- 
sulting implementation. Our design description is independent from concrete rating, 
reputation and evaluation functions. 

All communication is secured by encryption to reach confidentiality of all ratings 
and actions performed. Also all messages can be transferred in an anonymous way 
with an anonymous communication network. All actions and ratings are secured by 
digital signatures (given under a pseudonym using PRIME) for integrity reasons. 


1. identity data 
2. identity certification/credential 


reputation 
pseudonym 


3. show credential/certification 
Ey 4. reputation certificate/credential . 
5. show reputation 
6. application credential : A 
application pseudonym 


gS ims | 7. exchange additional information, e.g., for ios] 


Fig. 7.1: Infrastructure for users as reputation objects [Ste 10]. 


For identity management, a user registers himself with an identity management 
system (provider) by declaration of his identity data (Step | in Fig. 7.1). After verify- 
ing the data, the identity provider issues a credential or certification on (part of) these 
data (Step 2 in Fig. 7.1). By the use of an identity management system (provider), 
accountability of the pseudonym can be given. 

When the user wants to register with a reputation system (provider), he sends the 
reputation system the certification/credential he got from the identity management 
system (provider) (Step 3 in Fig. 7.1). This should guarantee that no user is able to 
build up a reputation under multiple pseudonyms within the same context and every 
user can be identified in the case of misbehaviour. The reputation system (provider) 
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creates a reputation certificate/credential based on the certificate/credential from the 
identity management system (provider) and sends it back to the user (Step 4 in 
Fig. 7.1). 

The reputation credential contains the user’s reputation pseudonym, his initial 
reputation and possibly other attributes such as the applications it can be used in or 
an expiration date. 

Based on the reputation credential, the user can register himself with an applica- 
tion by showing his reputation certificate/credential (Step 5 in Fig. 7.1). He thereby 
agrees that he will collect a reputation for his interactions within the application 
(e.g., a marketplace or a wiki) with the reputation system he registered with. Based 
on this, he gets an application credential to use the application (Step 6 in Fig. 7.1). 

Additionally, the user might interact with other users to exchange additional in- 
formation, e.g., via a trust management system to inform himself about this user 
(possibly as a rater) and other users in the reputation network (Step 7 in Fig. 7.1). 

Every action the user performs above can be done under distinct pseudonyms if 
convertible credentials are issued by the respective providers. 

We implemented this infrastructure for phpBB as application and the user- 
controlled privacy-enhancing identity management PRIME as outlined in [PSO8]. 
Currently we lack a trust management in our implementation. 


7.7 Conclusion 


Although many proposals for sophisticated reputation systems exist in scientific lit- 
erature, many applications on the Internet (especially the ones usable free of charge) 
just test simple reputation systems and adapt them slightly based on the experience 
the providers make. Especially security and privacy issues of users and providers get 
more and more important nowadays to prevent certain threats and attacks. Within 
PrimeLife, it was possible to contribute to this trend by making suggestions on in- 
teroperability and security aspects and implementing first tools that show how these 
issues addressed. 


Chapter 8 
Data Privacy 


Michele Bezzi, Sabrina De Capitani di Vimercati, Sara Foresti, Giovanni Livraga, 
Stefano Paraboschi, and Pierangela Samarati 


Abstract In today’s globally interconnected society, a huge amount of data about 
individuals is collected, processed, and disseminated. Data collections often con- 
tain sensitive personally identifiable information that need to be adequately pro- 
tected against improper disclosure. In this chapter, we describe novel information- 
theoretical privacy metrics, necessary to measure the privacy degree guaranteed 
by a published dataset. We then illustrate privacy protection techniques, based on 
fragmentation, that can be used to protect sensitive data and sensitive associations 
among them. 


8.1 Introduction 


In the modern digital society, information is one of the most valuable and demanded 
resources. Nowadays, organisations and end users are resorting to service providers 
for disseminating and sharing the huge amount of collected data they want to make 
available to others. Although this solution guarantees high data availability at a re- 
duced price, it introduces new privacy and security concerns. Indeed, the collected 
datasets often include sensitive personally identifiable information, which are no 
longer under the direct control of their owners. In such a scenario, guaranteeing the 
privacy of the data, be them published or outsourced to a service provider, becomes 
a primary requirement. 

A first step in the definition of methods that guarantee privacy protection in public 
or semi-public release consists in the definition of privacy metrics, measuring the de- 
gree of protection offered by a published dataset. Recently, most of this line of work 
has focused on k-anonymity [Sam01] and its variations (e.g., ¢-diversity [MGK06] 
and t-closeness [LLV07]), which guarantee that the released dataset satisfies a given 
protection degree, represented by the value of k (¢ and ft, resp.). These approaches 
basically define a minimal requirement (worst-case scenarios) that each combina- 
tion of entries in the dataset should satisfy. Although k-anonymity and its variations 
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are simple and effective, the privacy degree they offer can be neither combined in a 
unified privacy metric nor compared. To overcome these issues, novel metrics have 
recently been proposed [Bez 10] that express the disclosure risk in terms of informa- 
tion theory. These solutions are based on the concept of one-symbol information, 
which determines the contribution of each entry to the risk of disclosure and allows 
for assessing the privacy offered by privacy protection techniques. The modeling of 
the privacy problem, in the line of research mentioned above, typically assumes a 
setting where data to be protected are either quasi-identifiers or sensitive informa- 
tion associated with them. Novel proposals have instead considered the more gen- 
eral problem of protecting arbitrary sensitive associations among data [ABG*05]. 
These solutions are based on fragmenting data to break associations among them 
that should not be disclosed. The use of fragmentation for satisfying privacy needs 
has first been proposed in the data outsourcing scenario, where data are stored and 
managed by an external service provider, to improve query evaluation efficiency. In- 
deed, traditional approaches assume that an overlying layer of encryption is applied 
on the data before outsourcing them (e.g., [HIM02a]). Based on the observation 
that often what is sensitive is the association among data, more than the data per 
se, novel proposals resort to fragmentation, possibly combined with encryption, for 
privacy protection [ABG*t 05, CDF 07, CDFt 09b]. 

In this chapter, we illustrate novel privacy metrics for assessing privacy protec- 
tion techniques and we describe recent proposals, based on the use of fragmentation, 
for protecting data privacy. The remainder of this chapter is organised as follows. 
Section 8.2 describes privacy metrics based on information theory concepts. Sec- 
tion 8.3 introduces the basic concepts on which fragmentation-based proposals for 
protecting the privacy of data rely. Section 8.4 presents an approach combining frag- 
mentation and encryption to protect sensitive associations among data. Section 8.5 
illustrates a proposal that departs from encryption and where a small portion of the 
data is stored on the data owner’s side to break sensitive associations. Section 8.6 
describes how fragmentation can also be adopted in the data publication scenario, 
possibly complementing fragments with loose associations, representing in a sani- 
tised form the associations broken by fragmentation, to increase data utility. Finally, 
Section 8.7 concludes the chapter. 


8.2 Privacy Metrics and Information Theory 


In the literature, several models have been proposed for capturing different as- 
pects of the disclosure risk [FWCY10]. From the data publisher’s point of view, 
it would be desirable to have these privacy models expressed in terms of semanti- 
cally “similar” measures, so she could be able to compare their impact and optimise 
the trade-off between the different privacy risks. In [AAO1], the authors proposed 
an information theoretic framework to express average disclosure risk using mu- 
tual information. The advantages of mutual information formulation are twofold: 
i) it allows for expressing the different risk measures, and associated thresholds, in 
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Original dataset Anonymised dataset 
SSN | DoB | ZIP | Illness ||SSN} DoB ZIP | Illness 
123456|56/02/04|26010] Measles ELLOS * 126% ** | Measles 
234561 |58/11/07)26015]Asthma ELLOS # E/E IG ***) Asthma 


345271|52/09/07| 26123) Flu HE PL OS H/F Bl 
456291)71/06/07|40765) Flu HE PLOT EERE) AOE Bly 
563810)78/05/14}40123|HINI RE PLOT E/E AOE HINT 
67845 1}78/05/02|40672] Flu ee PLOT ERE AO Blu 


782340} 81/97/11)70128| Gastritis HE 11 OR *//**)70*** | Gastritis 
895641/85/01/01)70542|Chest pain] | ** |198*/**/**)70***|Chest pain 


(a) (b) 


Fig. 8.1: An example of a dataset (a) and its anonymised version (b). 


a common framework, with well defined units; ii) it permits the application of a 
wide range of well established information theory tools to risk optimisation (e.g., 
privacy-distortion trade-off problem [RMFDF09]). In this section, we will present 
some recent results on the information theoretic formulation of privacy risk mea- 
sures. 


8.2.1 Basic Concepts 


Existing privacy metrics (k-anonymity [SamO01], ¢-diversity [MGKO06] and t¢- 
closeness [LLV07]) define minimal requirements for each entry in the dataset, but 
because mutual information is an average quantity, frameworks expressing aver- 
age disclosure risk using mutual information are not able to completely express 
these conditions on single entries. In fact, as pointed out in [LLO9], privacy is an 
individual concept and should be measured separately for each individual. Accord- 
ingly, average measures such as mutual information are not able to fully capture 
privacy risk. To overcome this limitation, we should consider one-symbol informa- 
tion (i.e., the contribution to mutual information by a single record), and define the 
disclosure risk metrics accordingly [Bez10]. By introducing one-symbol informa- 
tion, it becomes possible to express and compare different risk concepts, such as 
k-anonymity, ¢-diversity and t-closeness, using the same units. In addition, it is pos- 
sible to obtain a set of constraints on the mutual and one-symbol information for 
satisfying ¢-diversity and f-closeness, and also to determine a relationship between 
the risk parameters ft and @, which allows us to assess ¢ in terms of the more intuitive 
£ value. 

Traditionally, all sensitive data that need to be protected are stored in a unique 
relation r over relation schema R(qd),...,@,), with a; an attribute on domain D;,i = 
1,...,”. From a disclosure perspective, attributes in R can be classified as follows: 


e Identifiers. Attributes that uniquely identify respondents (e.g., SSN). 


160 M. Bezzi, S. De Capitani di Vimercati, S. Foresti, G. Livraga, S. Paraboschi, P. Samarati 


e Quasi-identifiers (QIs). Attributes that, in combination, can be linked to external 
information to re-identify all or some of the respondents, or reduce the uncer- 
tainty over their identities (e.g., DateOfBirth, ZIP, Gender). 

e Sensitive attributes. Attributes that contain sensitive information about respon- 
dents (e.g., [llness, Salary, PoliticalParty). 


There are two types of disclosure: identity disclosure, and attribute disclosure. 
Identity disclosure occurs when the identity of an individual can be reconstructed 
and associated with a record in the released dataset. Attribute disclosure occurs 
when an attribute value can be associated with an individual (without necessarily 
being able to link to a specific record). In anonymising the original data, we want 
to prevent both kinds of disclosure. In the anonymisation process, identifiers are 
suppressed (or replaced with random values), but this is not sufficient, since by 
combining the quasi-identifier values with some external source information (e.g., a 
public register), an attacker could still be able to re-identify a subset of the records 
in the dataset. 

Let us consider a dataset containing identifiers, quasi-identifiers (referred to as 
X), and sensitive attributes, (referred to as W). Figure 8.1(a) illustrates an example of 
a relation including an identifier = {SSN}, a quasi-identifier X = {DoB,ZIP}, anda 
sensitive attribute W = {Illness}. We create an anonymised version of such data, 
removing identifiers, and anonymising the quasi-identifier (X), for example substi- 
tuting its original values with more general ones [Sam01]. Figure 8.1(b) reports an 
example of a dataset obtained by anonymising the relation in Figure 8.1(a). 


8.2.2 Traditional Privacy Metrics 


Among the metrics proposed so far, let us describe three of them, which well illus- 
trate the different aspects of privacy risks. 

k-anonymity [Sam01] condition requires that every combination of quasi-identifier 
attributes (QI group) is shared by at least k records in the anonymised dataset. A 
large k value indicates that the anonymised dataset has a low identity disclosure 
risk, because, at best, an attacker has a probability 1/k to re-identify a record, but 
it does not necessarily protect against attribute disclosure. In fact, a QI group (with 
minimal size of & records) could also have the same value for the sensitive attribute, 
so even if the attacker is not able to re-identify the record, she can discover the 
sensitive information. 

(-diversity [MGKO06] captures this kind of risk. ¢-diversity condition requires 
that, for every combination of quasi-identifier attributes, there should be at least @ 
“well represented” values for each sensitive attribute. In [MGK06], a number of 
definitions of “well represented” were proposed. Because we are interested in de- 
scribing an information-theoretic framework, the more relevant definition for us is 
in terms of entropy, 
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H(W|x) =— )' p(w[k) log, p(w|x) > log, ¢ 
wew 


for every QI group £, and with ¢ > 1, p(*) is the frequency of the QI group ¥ and 
p(w|x) is the relative frequency of the sensitive attribute w for a given QI group 
X. For example, if each QI group has n equally distributed values for the sensitive 
attributes, the entire dataset will be n-diverse. Note that if the ¢-diversity condition 
holds, the k-anonymity condition (with k < @) also automatically holds, since there 
should be at least @ records for each group of QIs. 

Although the ¢-diversity condition prevents a possible attacker from inferring the 
exact sensitive attribute values, she may still learn a considerable amount of prob- 
abilistic information. In particular, if the distribution of sensitive attributes within 
a QI group is very dissimilar from the distribution over the whole set, an attacker 
may increase his knowledge on sensitive attributes (skewness attack, see [LLVO7] 
for details). t-closeness [LLV07] estimates this risk by computing the distance be- 
tween the distribution of confidential attributes within the QI group and in the entire 
dataset. The authors in [LLV07] proposed two ways to measure the distance, one of 
them has a straightforward relationship with mutual information, as described in the 
following. 


8.2.3 An Information Theoretic Approach for Privacy Metrics 


Following [Bez10], we can reformulate the privacy metrics illustrated in Sec- 
tion 8.2.2 in terms of information metrics, therefore expressed using the same units 
(bits). 


k-anonymity 


In case of suppression and generalisation, a single QI group in the anonymised 
dataset ¥ can correspond to a number, Nz of records in the original table. Accord- 
ingly, the probability of re-identifying a record x given £ is simply: p(x|%) = 1/N¢, 
and k-anonymity reads: 


H(X|x) > logy k (8.1) 
for each % € X. In terms of one-symbol specific information [DM99] h = 
In(X,X) = H(X) — H(X|x), where H(X|%) = —Yycx p(4|x) logy p(¥|x), it reads 
e ~ N 
1n(X,%) = H(X) — H(X|) Slog, > (8.2) 


where N is the number of tuples in the original dataset X (assumed different). 
1,(X,X) measures the identity disclosure risk for a single record. Equation 8.1 holds 
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also in case of perturbative masking [Bez07], therefore /) can be used for any kind 
of masking transformations. 
Averaging Eq. 8.2 over X we get: 


x N 
1(X,X) < log, Z 
So, the mutual information can be used as a risk indicator for identity disclo- 
sure [DFRMO09], but we have to stress that this condition does not guarantee the 
k-anonymity for every QI group , that is, it is a necessary, although not sufficient, 
condition. 


t-closeness 


t-closeness condition requires: 


D(p(wlz)||p(w)) = ¥ plwlz)log, 2" < (8.3) 
wew p(w) 


for each ¥ € X. This is equivalent to the one-symbol information J, (surprise) [Fan61 ], 
that is: 


h(W,%) = ween” 23 8.4 
1( ’ ) Xl | ) 5) p(w) = ( ) 
I, (W,x) is a measure of attribute disclosure risk for a QI group £, as the difference 
between the prior belief about W from the knowledge of the entire distribution p(w), 
and the posterior belief p(w|) after having observed ¥ and the corresponding sensi- 
tive attributes. Averaging over the set X, we get an estimation of the disclosure risk 
(based on t-closeness) for the whole set [RMFDFO09], as follows: 


1(W.X) = Y, ple) Y, plw|s) log, 2" <, (8.5) 
EX wew p(w) 


Again, this is a necessary but not sufficient condition to satisfy t-closeness on an 
entire table, since this condition requires to satisfy t-closeness for each Xx. 


-diversity 
-diversity condition, in terms of entropy, reads: 


H(W|k) > log, é 


for each QI group & € X. It can be expressed in terms of one-symbol specific infor- 
mation Jy, as follows: 


8 Data Privacy 163 
b(W,%) = H(W) —H(W|x) < H(W) —log, ¢ (8.6) 


1,(W, x) is a measure of attribute disclosure risk for a QI group ¥, as the reduction of 
uncertainty between the prior distribution and the conditional distribution. Averag- 
ing over the set X, we get an estimation of the average disclosure risk for the whole 
set [RMFDF09]. 


I(W,X) =H(W) —H(W|X) < H(W) — log, (8.7) 


This is the ¢-diversity condition on average. Again, this is a necessary but not suf- 
ficient condition to satisfy ¢-diversity for each %. Note that the mutual informa- 
tion is a non-negative quantity (/(W,X) > 0). From Equation 8.7 immediately fol- 
lows that H(W) is an upper bound for log, @, that is, log, € << H(W) or equivalently 
b< bre = 2, 


Comparing Risk Parameters 


Equations 8.5 and 8.7 suggest a way to compare the two risk parameters ¢ and f. 
Indeed, if we equalise the maximal contribution to information of ¢ and t, we can 
derive the following relation: 


6, = 2H) (8.8) 


f, tells us, for a given t, what the equivalent value ¢ is, that is, the value of @ that 
has the same impact on the information. The advantage of Equation 8.8 is that it 
allows us to express the value of t parameter, which is often hard to set, in terms of 
£ that has a more intuitive meaning. 

In summary, for any anonymised dataset that satisfies ¢-diversity and t-closeness, 
the following average conditions are necessary: 


I(W,X) <1(W,X) 
I(W,X)<t (8.9) 
I(W,X) < H(W) — log, ¢ 


whereas the necessary and sufficient conditions are: 


qi (W,x) St 
{ RUM) © HOW) tog aa 
for each  € X. 
For setting the risk parameters, lower and upper bounds for £ are: 
lH£<hn22") (8.11) 


and the ¢, equivalent to ¢, is: 
t, — QH(W)-t 
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that allows for expressing ¢ in terms of the more intuitive diversity parameter £. 

In [LLV07], proposing t-closeness, the authors stated that “/ntuitively, privacy 
is measured by the information gain of an observer.” The question is which metric 
we should use for measuring such information gain. In [Bez10], the author showed 
that if we consider information gain as a reduction of uncertainty, the corresponding 
privacy metric is similar to ¢-diversity, whereas if we think of information gain as 
the novelty of the information, t-closeness is the corresponding metric. Accordingly, 
the choice of the privacy risk metric depends on what kind of information we do not 
want to disclose, which in turn depends on the specific application, the tolerable 
level of information loss, and the attack model. The advantage of the formulation in 
terms of information theory is that it allows for expressing all the different metrics 
using comparable units (bits), and, at least in principle, use all the tools of informa- 
tion theory for optimising the privacy constraints, and, possibly, utility. 


8.2.4 Protecting Privacy of Sensitive Value Distributions 


The privacy metrics described in the previous section have been designed to measure 
the risk of identity and attribute disclosure in microdata release. However, similar 
metrics can also be used for different scenarios. For instance, in [BDLS10], the au- 
thors propose a model to assess the exposure of sensitive information in a scenario 
of data release, based on the concept of one-symbol information. The scenario con- 
cerns the incremental release of tuples of a table r defined over relation schema 
R(a,...,dn). Requested tuples, singularly considered, are not sensitive; however, 
their aggregation may allow for the inference of sensitive information not explic- 
itly represented in the released dataset. In particular, inference is caused by peculiar 
value distributions of some released data. As an example, the age distribution of 
military personnel posted in a given location can allow an observer to deduce the 
sensitive nature of the location itself (e.g., a training campus or a headquarter). The 
authors propose a metric able to assess the disclosure risk caused by the release of 
a tuple. 

The model defines sensitive entities in r as sensitive properties of targets, that are 
values of one or more non-sensitive released attributes Y C R. A sensitive property 
of target y € Dy, where Dy is the domain of Y, is referred to as s(y). For instance, 
considering the relation schema R(Name, Age, Location) where Name is the 
name of a soldier, Age is her age, and Location is the location where the sol- 
dier is posted, targets can be the values of attribute Location (e.g., Li,...,LZn) 
and the sensitive property of each target can be the related location type (e.g., 
s(L,) = Headquarter,...,s(Ln) = TrainingCampus). Inferences on the sensitive 
property s(y) of target y can be caused by the distribution of values of some other at- 
tributes X C R for the specific y (i.e., P(X |y)), when there is a dependency between 
attributes X and Y. For instance, X = {Age}: in fact, P(Age|L;) can reveal the loca- 
tion type (sensitive property) of target L;. Dependency between attributes X and Y 
introduces an inference channel in a dataset concerning such attributes. In particu- 
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lar, inferences on the values of s(y) arise when P(X |y) shows peculiarities, differing 
from a typical (baseline) distribution of X that is expected and publicly known. The 
data model proposes the definition of X-outliers, targets for which the difference be- 
tween P(X|y) and the baseline exceeds a given threshold. X-outliers show unusual 
distributions of P(X|y), and this peculiarity induces in the observer an information 
gain that exposes the value of sensitive property s(y) [BDLS10]. To assess such in- 
formation gain, the authors recur to J;(y,X), and compare it to the average value 
represented by the mutual information /(X ,Y). In this way, the authors can evaluate 
when the contribution of a particular target to the mutual information is unusually 
large, exposing as a consequence the value of the related sensitive property. 

We note, however, that since observers do not have access to the original rela- 
tion (supposed to be stored in a trusted environment), they can only see and learn 
distributions from the released collection of data: for this reason, the definition of 
actual X-outliers of the original relation may not fit what is observable. Therefore, 
the authors suggest that the evaluation of single target contributions to the mutual 
information be enforced runtime, on the released dataset, after a sufficient amount 
of tuples has been released to make the observable distributions representative of 
the original relation content. Therefore, in [BDLS10], the authors propose the eval- 
uation of one-symbol information /;(y,X) on the released dataset as a means for 
quantitatively assessing the risk of the release of a requested tuple. 


8.3 Privacy Protection Techniques 


Different protection techniques have been proposed to guarantee the privacy of the 
data when they are stored and managed by a third party (i.e., a service provider). 
Since the service provider is not under the direct control of the data owner, the pri- 
vacy of the data may be put at risk. Traditional solutions (e.g., [DFPS07]) for pro- 
tecting privacy of outsourced data assume that an overlying layer of encryption is 
applied on data before outsourcing. These solutions, however, highly reduce query 
evaluation efficiency. Also, they are based on the assumption that outsourced data 
are all equally sensitive, and therefore encryption is a price to be paid for their pro- 
tection. However, in many scenarios, what is sensitive is the association among data, 
more than data items per se. In the following sections of this chapter, we will de- 
scribe novel privacy protection techniques that, based on this observation, limit the 
use of encryption (or avoid it completely) by complementing it with fragmentation. 


8.3.1 Basic Concepts 


Privacy requirements can be conveniently modeled through confidentiality con- 
straints. A confidentiality constraint c over relation schema R(qj,...,dy) is a subset 
of attributes of R, i.e., c C R. The semantics of a confidentiality constraint c are that, 
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MEDICALDATA 

SSN Name DoB ZIP IlIness Treatment 
123456789 | Alice 84/07/31 | 26015 | Pharyngitis | Antibiotic co={SSN} 
231546586 | Bob 82/05/20 | 26010 | Flu Aspirin ci={Name, DoB} 
378565241 | Carol 20/01/30 | 50010 | Gastritis Antacid c={Name, ZIP } 
489754278 | David 80/07/02 | 20015 | Broken Leg | Physiotherapy | c3={Name, Illness} 
589076542 | Emma 75/02/07 | 26018 | Gastritis None ca={Name, Treatment } 
675445372 | Fred 75/02/17 | 26013 | Asthma Bronchodilator | ¢s={DoB, ZIP, Illness} 
719283746 | Gregory | 70/05/04 | 26020 | Diabetes Insulin c6={DoB, ZIP, Treatment } 
812345098 | Henrik 65/12/08 |} 20010 | Cancer Chemoterapy 

(a) (b) 


Fig. 8.2: An example of relation (a) and of a set of well-defined constraints over it 


(b). 


for each tuple in r, the joint visibility of the values of the attributes in c is sensi- 
tive, and needs to be protected. In particular, depending on the number of attributes 
involved, confidentiality constraints can be classified as follows. 


e Singleton constraints. A singleton constraint states that the values of the attribute 
involved in the constraint are sensitive and cannot be released. For instance, the 
SSN of patients of a given hospital must be protected from disclosure. 

e Association constraints. An association constraint states that the association 
among the values of the attributes in the constraint is sensitive and cannot be 
released. Even if, singularly taken, the values of the attributes in the constraint 
are not sensitive, the joint visibility of their values must be prevented. For in- 
stance, the association between the Name and the Illness of a patient has to 
be protected from disclosure. 


The definition of confidentiality constraints, while immediate, provides great 
flexibility in the characterisation of the desired protection requirements. We note 
that the satisfaction of a constraint c; implies the satisfaction of any constraint c; 
such that c; C c;. Therefore, a set @= {c1,...,Cm} of confidentiality constraints is 
said to be well-defined if and only if Vcj,cj € @, j A i,ci Z cj. 


Example 8.1. Figure 8.2 illustrates an example of relation (MEDICALDATA) along 
with a set of well defined confidentiality constraints, modeling the following privacy 
requirements: 


the list of SSNs of patients is considered sensitive (co); 
the association of patients’ names with any other information in the relation is 
considered sensitive (c1,...,C4); 

e attributes DoB and ZIP can work as a quasi-identifier [Sam01] and therefore can 
be exploited to infer the identity of patients; as a consequence, their associations 
with both Illness and Treatment are considered sensitive (c5 and c¢). 


Given a relation r defined over relation schema R(q,...,dy), the set Gof confi- 
dentiality constraints defined over R identifies the privacy requirements that must 
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be enforced when outsourcing r. Most of the proposals in the literature have put 
forward the idea of combining fragmentation and encryption techniques to satisfy 
confidentiality constraints. Both encryption and fragmentation, consistently with the 
formulation of confidentiality constraints, operate at the attribute level, that is, they 
involve attributes in their entirety. In particular, these techniques work as follows. 


e Encryption. The values assumed by the attribute are encrypted, tuple by tuple, to 
make them visible only to authorised users. 

e Fragmentation. The attributes in R are partitioned in different subsets, referred 
to as fragments, that are outsourced instead of R, to make attributes in different 
fragments jointly visible only to authorised users. A fragment F; of a relation R 
is a subset of the attributes in R (i.e., F;C R), and a fragmentation is a set of 
fragments over R (ie., F={F),...,Fm}). A fragment instance of F; is the set of 
tuples obtained by projecting the tuples in r over the attributes composing Fj. 


Fragmentation and encryption can be combined in different ways to enforce con- 
fidentiality constraints. In particular, proposals in the literature differ on how orig- 
inal relation schema R is fragmented and whether encryption is applied. The first 
strategy proposed in the literature to enforce confidentiality constraints through frag- 
mentation and encryption has been illustrated in [ABGT05]. This strategy is based 
on the assumption that data can be stored on two (or more) non-communicating 
servers. The relation schema R is therefore partitioned in two (or more) different 
fragments, each stored on a different server. The confidentiality constraints that can- 
not be enforced by means of fragmentation are solved by encrypting at least one of 
the attributes in the constraint. Since the assumption of the presence of two (or 
more) non-communicating servers is limiting in practice (collusion among servers 
can compromise the protection of sensitive data), new techniques have recently been 
proposed. These solutions nicely couple fragmentation and encryption to possibly 
store all the fragments in Y on a unique server. In Sections 8.4 and 8.5, we will 
describe these techniques in more details. 


8.4 Fragmentation and Encryption 


The proposal illustrated in [CDF*07], and refined in [CDFT09a, CDF 10], sug- 
gests the combined use of fragmentation and encryption techniques to enforce 
confidentiality constraints, while removing the need for storing fragments on non- 
communicating servers. In this section, we illustrate the data fragmentation model 
proposed in [CDF*07], we briefly describe how to compute a fragmentation enforc- 
ing a set of confidentiality constraints, and we finally illustrate the query evaluation 
process over fragmented data. 
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8.4.1 Fragmentation Model 


The technique proposed in [CDF*07] enforces confidentiality constraints through 
the combined use of fragmentation and encryption. Intuitively, singleton constraints 
can only be enforced by means of encryption. Association constraints can be en- 
forced via either fragmentation, storing attributes in the constraint in different frag- 
ments, or encryption, encrypting at least one of the attributes in the constraint. An 
advantage of this approach is that fragments can all be stored on the same server, 
since only authorised users, who know the encryption key, can join them. A frag- 
mentation Y = {F\,...,F} is correct if and only if the following two conditions 
hold: i) Ve € @,VF € ¥,c £ F; and ii) VF;,F; © F,i 4 j, Fi 0F; = 9. The first 
condition states that a fragment cannot contain in the clear all the attributes com- 
posing a confidentiality constraint (constraints satisfaction). The second condition 
states that fragments must be disjoint, since otherwise common attributes could be 
exploited to join fragments and reconstruct sensitive associations. 

At a physical level, each fragment F; € -F is translated into a physical fragment 
that contains all the attributes in F; in the clear, and all the other attributes in R are 
encrypted. Reporting all the attributes of R in each fragment, in either encrypted or 
plain form, permits the execution of any query on a single physical fragment, thus 
making the query evaluation process more efficient. The schema of the physical 
fragment F* storing fragment F; is F¢(salt,enc,aj,,...,di,), where: 


salt is the primary key of F* and contains a randomly chosen value; 
enc represents the encryption of all the attributes in R that do not belong to Fj, 
combined before encryption via a binary XOR with a salt, to prevent frequency- 
based attacks; 

© dj,,.--,dj, are the attributes in fragment F7. 


Since the availability of plaintext attributes in a fragment makes query evalua- 
tion more efficient, fragmentation should be preferred to encryption in constraints 
satisfaction. To minimise the adoption of encryption, and maximise visibility, each 
attribute not appearing in a singleton constraint must belong to at least one fragment. 
As a consequence, a fragmentation that is correct and maximises visibility requires 
that each attribute not involved in a singleton constraint must appear in plaintext in 
exactly one fragment. 


Example 8.2. Consider relation MEDICALDATA in Figure 8.2a and the set of con- 
straints over it in Figure 8.2b. An example of a correct fragmentation that maximises 
visibility is A={{Name}, {DoB, ZIP}, {Illness, Treatment }}. Figure 8.3 
illustrates the fragment instances over the physical fragments corresponding to .¥. 
Note that only the attribute SSN does not appear in the clear in the fragments, since 
it belongs to singleton constraint co. 
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F¢ FS FS 
salt}enc| Name | |salt/enc] DoB | ZIP | |salt/enc) Illness Treatment 
st | @ [Alice 57 ] © [84/07/31]26015] | s7 | [Pharyngitis ]Antibiotic 
sh | B |Bob s3 | t |82/05/20]26010] | s3 | p |Flu Aspirin 
si | y |Carol 53 kK |20/01/30}50010 33 o |Gastritis |Antacid 
si | & |David sq | A |80/07/02}20015| | sj | t |Broken Leg|Physiotherapy 
sz | € |Emma | | sz | wt /75/02/07|26018) | s2 | v |Gastritis [None 
so | ¢ |Fred sz | v |75/02/17|26013] | s2 | @ |Asthma — |Bronchodilator 
sh | n |Gregory] | s5 | € ]70/05/04]26020] | s3 | x |Diabetes  |Insulin 
sq | @ |Henrik 55 o |65/12/08}20010 53 yw |Cancer Chemoterapy 


Fig. 8.3: An example of a correct fragmentation in the multiple fragments scenario. 


8.4.2 Minimal Fragmentation 


Given a relation schema R and a set @ of confidentiality constraints over it, there 
may exist different correct fragmentations that maximise visibility. As an example, 
a fragmentation -¥ with a singleton fragment for each attribute a € R that does not 
belong to a singleton constraint is correct and maximises visibility. However, this 
solution highly impacts the efficiency of the evaluation of queries involving more 
than one attribute. It is therefore necessary to identify a fragmentation that i) is 
correct, ii) maximises visibility, and iii) maximises efficiency in query evaluation. 

Several criteria have been proposed for designing an optimal fragmentation, aim- 
ing at reducing the cost of query evaluation. A simple approach consists in minimis- 
ing the number of fragments in . The rationale to this metric is that reducing the 
number of fragments implies that more attributes are stored in the clear in the same 
fragment, thus improving query execution efficiency. The problem of computing 
a fragmentation with the minimum number of fragments is however NP-hard (the 
hypergraph colouring problem [GJ79] reduces it). It is therefore necessary to deter- 
mine a heuristic algorithm able to compute a good, even if not optimal, fragmenta- 
tion. To this purpose, in [CDF* 07], the authors introduce the concept of dominance 
between two fragmentations. A fragmentation ¥' dominates a fragmentation F, 
denoted Fx", if F' can be obtained merging two (or more) fragments in ¥. 
The dominance relation clearly states that a fragmentation ¥ is more convenient 
than a fragmentation ¥’ such that ¥~<.F’. The heuristic proposed in [CDF*t07] 
is therefore based on the computation of a minimal fragmentation with respect to 
the dominance relationship <. A minimal fragmentation is a fragmentation .F that 
fulfills the following three requirements: i) -¥ is correct; ii) A maximises visibil- 
ity; and iii) there is not another fragmentation .¥’ that is correct, that maximises 
visibility and such that ¥~<¥'. 

To provide a more precise measure of the quality of a fragmentation, alterna- 
tive solutions have been proposed. In line with traditional fragmentation solutions, 
the quality of a fragmentation can be measured in terms of the affinity between the 
attributes represented in the clear in the same fragment [CDF* 10]. Since attribute 
affinity permits only the measurement of the advantage of storing in the same frag- 
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ments pairs of attributes, in [CDF*09a], the authors proposed a more precise ap- 
proach, based on the estimation of query evaluation costs, considering the expected 
query workload on ¥. 

We note that both the problems of maximising affinity and minimising query 
evaluation costs when computing a correct fragmentation are NP-hard. Following 
this observation, ad-hoc heuristics have been proposed, tailored to the specific frag- 
mentation metric [CDF*09a, CDF* 10]. These algorithms exploit the fact that both 
affinity [CDF* 10] and cost functions [CDF* 09a] are monotonic with respect to the 
dominance relationship <. This means that increasing the number of attributes rep- 
resented in the clear in the same fragment, the quality of the fragmentation also 
increases. 


8.4.3 Query Evaluation 


A clear effect of the fragmentation process is that queries formulated by users on 
relation schema R must be evaluated on physical fragments, instead of on the origi- 
nal relation. As introduced in Section 8.4.1, each physical fragment stored in place 
of relation schema R contains, in either encrypted or clear form, all the attributes in 
R. As a consequence, each query can be evaluated by accessing one physical frag- 
ment only (the one offering better performances). Let us consider, for simplicity, 
a query qg operating on relation R, of the form SELECT A FROM R WHERE Cond, 
where A is a subset of attributes in R, and Cond = (cond; is a conjunction of ba- 
sic predicates of the form (a; op v), (aj op aj), or (aj IN {vi,...,VK}), where a, aj, 
aj © R, {v,v1,...,ve} are constant values in the domain of attribute a;, and op is a 
comparison operator in {=,4, >,<, >,<}. For simplicity, in the following, we will 
use notation Attr(cond) to represent the attributes on which basic condition cond is 
defined. 

Given the fragment F on which the query must be evaluated, first the condition 
in the WHERE clause of the query is split into two sub-conditions, depending on the 
party (server or client) that can evaluate it, as follows: 


e Cond, = /\,cond; : Attr(cond;) C F is the conjunction of conditions involving 
attributes included in the chosen fragment that, being reported in plaintext, can 
be evaluated by the server; 

e Cond, = (cond; : Attr(cond;) C (R\F) is the conjunction of basic conditions 
involving attributes not included in the fragment that, being encrypted, can be 
evaluated only by the client. 


After condition Cond has been split, the original query is translated into two 
queries: g,; operating on the server side, and qg, operating at the client side on the 
result R; of gs. The query executed by the server operates on the selected physical 
fragment F°, evaluates condition Conds, and returns salt, enc and the attributes in 
F that appear in the SELECT clause of g. When the client receives the result R, of 
query qs, it decrypts attribute enc and evaluates, on the resulting tuples, condition 
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Original query Translated queries 
:= SELECT Treatment ds := SELECT salt, enc, Treatment 
FROM MedicalData FROM F% 
WHERE I1lness=‘Asthma’ AND WHERE I11ness=‘Asthma’ 
ZIP=‘26013’ 
dc = SELECT Treatment 


FROM Decrypt(Rs.enc,salt,k) 
WHERE Z1IP=‘26013’ 


Fig. 8.4: An example of query translation in the multiple fragments scenario. 


Cond,. Finally, the client projects the result on the attributes in A. Note that if Cond, 
is empty and A, = A, q, does not need to be executed and q, does not need to return 
attributes salt and enc, since R, coincides with the result of the original query q. 


Example 8.3. Consider relation MEDICALDATA in Figure 8.2a, the set of constraints 
over it in Figure 8.2b, and the fragmentation in Figure 8.3. Suppose now that a 
client formulates a query g = SELECT Treatment FROM MedicalData WHERE 
Illness=‘Asthma’ AND ZIP=*26013’ returning the treatments to asthma adopted 
in zone 26013. The query can be translated to operate on any of the three fragments, 
but the evaluation using F’§ is more convenient than using F’} or F’5, since F’§ contains 
a subset of the attributes on which g formulates its conditions. The translation of 
query q in the corresponding queries operating on the server and on the client side, 
using FS, is illustrated in Figure 8.4. 


8.5 Departing from Encryption 


The combined use of fragmentation and encryption presents several advantages, 
since the use of encryption is limited to the enforcement of singleton constraints. 
Although cryptographic tools enjoy today limited computational complexity, en- 
cryption carries the burden of key management and of expensive query evaluation 
over encrypted data. To overcome these limitations, in [CDF*09b], the authors put 
forward the idea of completely departing from encryption for constraint satisfac- 
tion. This solution is based on the assumption that the data owner is willing to store 
a limited portion of the data to enforce confidentiality constraints. In this section, we 
illustrate the data fragmentation model proposed in [CDF*09b], we briefly describe 
how to compute a fragmentation that enforces confidentiality constraints while min- 
imising the data owner’s workload, and we finally discuss how queries can be eval- 
uated on fragmented data. 
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8.5.1 Fragmentation Model 


The proposal illustrated in [CDF* 09b] enforces confidentiality constraints by stor- 
ing a subset of the data at the owner side, while the remaining information is out- 
sourced to a service provider. Intuitively, confidentiality constraints are enforced by 
storing at least one attribute for each constraint on the data owner’s side. In this sce- 
nario, the fragmentation process produces a pair Y = (F,,F;) of fragments, where 
F, is stored on the data owner, and Fs is stored on the external storage server. A 
fragmentation Y = (F,,F;) is correct if and only if it satisfies the following two 
conditions: i) Vc € @,c Z Fs; and ii) Fj UF; = R. The first condition states that 
fragment F’; does not violate any confidentiality constraint. Note that fragment F, 
does not need to satisfy this condition, since it is stored on the data owner’s side and 
is therefore accessible only to authorised users. The second condition states that all 
the attributes in R must be represented either on the data owner or on the external 
server, to prevent loss of information. Although the data owner is willing to store 
a portion of the data, her storage capacity is limited. As a consequence, attributes 
that belong to F, should not be replicated in F, as well, since redundancy is not 
required. In other words, F,, and F, should be disjointed, to avoid the replication 
of attributes already stored on the server side also on the data owner’s side. At the 
physical level, however, the two fragments must have a common key attribute neces- 
sary to reconstruct the content of the original relation r (lossless join property). This 
attribute can be either the primary key of relation R, if it is not sensitive, or an at- 
tribute that does not belong to the schema of the original relation R and that is added 
to both fragments after the fragmentation process. At a physical level, a fragmen- 
tation ¥ = (F,,F,), where Fy = {do,,.-.,do,} and F's = {dg,,... 1ds;}, is translated 
into physical fragments F¢(tid,do,,...,do,) and F§(tid,ds,,... ds;), where tid is 
the common tuple identifier. 


Example 8.4. Consider relation MEDICALDATA in Figure 8.2a and the set of well 
defined constraints over it in Figure 8.2b. An example of a correct and non- 
redundant fragmentation is Fj,={SSN, Name, ZIP} and F,={DoB, Illness, 
Treatment}. Figure 8.5 illustrates the fragment instances over the physical frag- 
ments corresponding to F, and F’;. Constraint co is satisfied by storing attribute SSN 
in F,. Constraints c;,...,c4 are satisfied by storing attribute Name in F,. Constraints 
Cs and c¢ are satisfied by storing attribute ZIP in F,. 


8.5.2 Minimal Fragmentation 


Given a relation schema R and a set @ of confidentiality constraints over it, there 
may exist different fragmentations that are both correct and non-redundant. For in- 
stance, consider a fragmentation .Y = (F,,F;) where F, =R and F, = 0. This frag- 
mentation is clearly correct and non-redundant, but it is not desirable since it corre- 
sponds to no outsourcing. Among all the correct and non-redundant fragmentations, 
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Fo Fs 

tid) SSN Name | ZIP | |tid) DoB Illness Treatment 

1 |123456789]/Alice {26015} | 1 |84/07/31)|Pharyngitis | Antibiotic 

2 |231546586)Bob 26010) | 2 |82/05/20}Flu Aspirin 

3 |378565241|Carol 50010} | 3 |20/01/30)Gastritis | Antacid 

4 |489754278|David |20015}] | 4 |80/07/02|Broken Leg|Physiotherapy 
5 |589076542/Emma |26018} | 5 |75/02/07|/Gastritis |None 

6 |675445372|Fred 26013} | 6 |75/02/17| Asthma Bronchodilator 
7 |719283746|Gregory|26020} | 7 |70/05/04|Diabetes [Insulin 

8 ]812345098)Henrik |20010] | 8 |65/12/08|Cancer Chemoterapy 


Fig. 8.5: An example of a correct fragmentation in the departing-from-encryption 
scenario. 


it is necessary to compute a solution that minimises the data owner’s workload in 
terms of storage and/or intervention in the query evaluation process. 

To compute a fragmentation that minimises the data owner’s workload, it is first 
necessary to define a metric that quantitatively measures the cost of a fragmentation. 
This metric can be defined in different ways, depending on which resource is consid- 
ered more valuable by the data owner (whose consumption should be minimised), 
and on the information available about the system workload at fragmentation time. 
Furthermore, there are different ways for measuring the consumption of the same 
resource and for combining metrics that analyse the use of different resources. As 
an example, in [CDF*09b], the authors propose four metrics, each corresponding 
to a different minimisation problem: 


e Min-Attr: minimises the number of attributes in Fo; 

e Min-Size: minimises the physical size of the attributes in F 5; 

e Min-Query: minimises the number of queries that involve at least one attribute in 
Fo; 

e Min-Cond: minimises the number of conditions in queries that are evaluated over 
attributes in Fy. 


We note that the minimisation of the number of attributes implies a minimisation 
of the storage space used at the data owner side, as well as a minimisation of the 
number of queries that require an involvement of the data owner. However, if more 
precise information about attributes’ size or the query workload is available, it is 
possible to adopt more precise metrics in the computation of a fragmentation that 
minimises storage (Min-Size) or computation and bandwidth (Min-Query and Min- 
Cond) resources. 

Independently from the fragmentation metric adopted, the problem of computing 
a minimal fragmentation is NP-hard (as demonstrated in [CDF*09b], the minimum 
hitting set problem reduces to it). As a consequence, in [CDF*09b], the authors 
propose a heuristic algorithm able to compute a minimal fragmentation in polyno- 
mial time, with respect to the number of attributes in R. The main advantage of this 
heuristic is its flexibility, since it can be adopted with any metric. 
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Server-Owner strategy Owner-Server strategy 
qs ‘= SELECT tid Yo 1= SELECT tid 

FROM F¥ FROM Fé 

WHERE I11ness=‘Asthma’ WHERE ZIP=‘26013’ 
dso 2= SELECT Name qs ‘= SELECT tid 

FROM F®€ JOIN Rs ON FROM F¥ 

Fe tid=R,.tid WHERE I11lness=‘Asthma’ AND 
WHERE ZIP=‘26013’ tid IN {6} 
dso = SELECT Name 
FROM F* JOIN R; ON 
Fo.tid=R,.tid 


Fig. 8.6: An example of query translation in the departing-from-encryption scenario. 


8.5.3 Query Evaluation 


Since the attributes composing R are partitioned in two fragments to satisfy confi- 
dentiality constraints, the queries formulated by users on R must be translated into 
an equivalent set of queries on F¢ and F<. To this purpose, condition Cond=/\;cnd; 
in the WHERE clause of the original query g must be split in sub-conditions, depend- 
ing on the party in the system who can evaluate it. More precisely, Cond is split in 
the following three conditions: 


e Condo=,cnd; : Attr(cnd;)CF is the conjunction of basic conditions that can be 
evaluated only by the data owner; 

e Conds=(\,cnd; : Attr(cnd;)CF is the conjunction of basic conditions that can be 
evaluated by the server, since they involve only attributes stored at the server; 

© Conds .=/\;cnd; : Attr(cnd))AF 40 and Attr(cnd))NF's40 is the conjunction of 
basic conditions of the form (a; op aj), where a;€F, and a;€Fs, that can be 
evaluated only by the data owner, with the support of the server. 


The evaluation of a query g on R can follow two different strategies, depend- 
ing on the order in which conditions Cond;, Condo, and Condso are evaluated, as 
described in the following. 


e Server-Owner strategy first evaluates condition Cond, on the server side and then 
evaluates both Cond, and Conds, on the data owner side. 

e Owner-Server strategy first evaluates condition Cond, on the data owner side, 
then evaluates condition Cond, on the server side, and finally refines the result 
evaluating Conds on the data owner side. 


Example 8.5. Consider relation MEDICALDATA in Figure 8.2a, the set of constraints 
over it in Figure 8.2b, the fragmentation in Figure 8.5, and query g = SELECT 
Name FROM MedicalData WHERE I1lness=‘Asthma’ AND Z1IP=‘26013’ 
returning the name of patients suffering from asthma in zone 26013. Condition 
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Cond in the WHERE clause of gq is split as follows: Cond,=(I11ness=‘Asthma’), 
Cond =(Z1P='26013’), and Conds )=0. Figure 8.6 illustrates how gq is translated 
into an equivalent set of queries, following both the Server-Owner and the Owner- 
Server strategies. 


The choice between the Server-Owner and the Owner-Server strategies depends 
on the possible leakage of information that the Owner-Server strategy may cause, 
as well as on their efficiency. In fact, if query qg is not secret, the server knows the 
identifiers of the tuples satisfying condition Cond,, possibly reconstructing sensitive 
associations. For instance, consider the example in Figure 8.6, the server knows that 
the query with t id=6 is the unique tuple in R where ZIP=‘26013’, thus violating 
constraints cs and c¢. 


8.6 Preserving Utility in Data Publication 


Although fragmentation has been proposed as a method for the enforcement of con- 
fidentiality constraints in the data outsourcing scenario, it can also be adopted in 
data publishing, since it permits the publishing of views (fragments) only on data 
that do not expose sensitive associations. To increase the utility of published in- 
formation, the fragmentation process should take into consideration the recipients’ 
needs for information and the purpose of data publication. The fragmentation pro- 
cess can therefore be driven by visibility requirements, expressing views over data 
that the fragmentation should satisfy. Furthermore, fragments can be complemented 
with the release of the sensitive associations broken by fragmentation in a sanitised 
form as loose associations, defined in a way to guarantee a specified degree of pri- 
vacy. In this section, we describe the proposal in [DFJ* 10a] for guaranteeing utility 
in data publication by both defining a fragmentation that satisfies visibility require- 
ments and publishing loose associations. 


8.6.1 Visibility Requirements 


Visibility requirements model views over the data that the fragmentation process 
should guarantee. A visibility requirement v over a relation schema R(qj,...,dy) 1s a 
monotonic boolean formula over {a1,...,@,}. The semantics of a visibility require- 
ment v are that the visibility over the view represented by v should be guaranteed 
by the fragmentation. We note that the negation operator cannot be used in the defi- 
nition of visibility requirements, since it corresponds to requiring the non-visibility 
over values, which is already captured by confidentiality constraints. Visibility re- 
quirements permit the expression of different needs of visibility, as described in the 
following. 
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e Attributes visibility. The values of the attribute in the visibility requirement 
should be released (v = a). 

e Association visibility. The association among the values of the attributes (views, 
in general) in the visibility requirement should be released (v = a; .../\aj). 

e Alternative views. At least one among the views composing the visibility require- 
ment should be released (v = v; V vj). 


A fragmentation ¥ satisfies a set V of visibility requirements if, for each re- 
quirement ve”, there exists at least a fragment F'€.¥ that satisfies v. We note that 
all the visibility requirements must be satisfied, but not necessarily by the same 
fragment in #%. A fragmentation is therefore correct and can be published only if it 
satisfies both confidentiality constraints and visibility requirements. 


Example 8.6. Consider relation MEDICALDATA in Figure 8.2a. An example of a set 
Y of visibility requirements is represented by: 


e vj = SSNV Name states that either the SSN or the Name of patients should be 
released; 

e v2=I1llnessA Treatment states that the association between Illnesses 
and Treatments should be released; 

e v3 =NameV (DoBA Z1P) states that either the Name of patients, or the DoB and 
ZIP of patients in association should be released. 


Fragmentation #={{Name}, {DoB, ZIP}, {Illness, Treatment}} in Fig- 
ure 8.3 satisfies the visibility requirements in Y: v; is satisfied by F1, v2 is satisfied 
by F3, and v3 is satisfied by F,. Fragmentation . also satisfies the confidentiality 
constraints in Figure 8.2b. As a consequence, -¥ represents a correct fragmentation. 


8.6.2 Loose Associations 


Even if fragments in a correct fragmentation cannot be joined, in [DFJ* 10a], the au- 
thors propose publishing /oose associations among their tuples. A loose association 
reveals some information on the association broken by fragmentation, while guar- 
anteeing that a given privacy degree is respected (to guarantee that confidentiality 
constraints cannot be violated). Intuitively, loose associations hide tuples participat- 
ing in the associations in groups and provide information on the associations only 
at the group level (in contrast to the tuple level). 

Let us consider a pair F; and F, of fragments in a fragmentation .¥, their in- 
stances f; and f,, and the set @’ of confidentiality constraints completely covered 
by them, that is VcE@, cé@' if cCF /UF,. As an example, consider fragments F' 
and F'3 in the fragmentation in Figure 8.3 and the confidentiality constraints in Fig- 
ure 8.2b, C'={c5,c6}. Since a loose association between F; and F’, provides infor- 
mation about associations among groups of tuples in the fragments, the first step 
necessary to define a loose association consists in partitioning the tuples inf; and /, 
into groups. We note that, as described in the following, the size of the groups into 
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1, | 84/07/31 | 26015 ¢————| J} Pharyngitis | Antibiotic r| 
ly | 82/05/20 | 26010 { M Gastritis None rs 
1; | 20/01/30 | 50010 Gastritis Antacid r3 
14 | 80/07/02 | 20015 { M Cancer Chemotherapy | rg 
Ls | 75/02/07 | 26018 Flu Aspirin r2 
Ig | 75/02/17 | 26013 MN i Diabetes Insulin 7 
1, | 70/02/07 | 26020 Asthma Bronchodilator| r¢ 
lg | 65/12/08 | 20010 { i Broken leg | Physiotherapy | r4 


Fig. 8.7: An example of (2,2)-grouping. 


which tuples in f; and f;. are partitioned impacts the privacy degree guaranteed by 
the loose association. Therefore, in [DFJ* 10a], the authors associate a parameter k 
with grouping, stating the lower bound to the size of groups. A k-grouping partitions 
the tuples in a fragment f in groups of size greater than or equal to k. A k-grouping is 
minimal if it minimises the size of each group (or, equivalently, maximises the num- 
ber of groups), while respecting the lower threshold k. We note that the grouping 
parameter used for F; can possibly be different from the grouping parameter used 
for F,. Notation (k;-k,)-grouping is used to represent a k;-grouping over F; and a k,- 
grouping over F’,. It is minimal if both the k;-grouping over F and the k,-grouping 
over F’, are minimal. On the basis of the (4;-k,)-grouping defined over F; and F’,, it 
is possible to define a group association A, representing the relationships between 
the tuples in f; and f, at the group level. Intuitively, for each tuple ¢ in the orig- 
inal relation r, A includes a tuple representing the relationship between the group 
identifiers assigned by the (A;-k,)-grouping to the semi-tuple / representing f in f; 
and to the semi-tuple r representing f in f,. As an example, Figure 8.7 graphically 
illustrates the group association, also represented as a relational table in Figure 8.8, 
induced by a (2,2)-grouping on fragments F; = F2 and F, = F3 in Figure 8.3. 

The protection offered by publishing group-level associations can be compro- 
mised if the tuples within a group have the same value for the attributes in a confi- 
dentiality constraint. To capture this situation, in [DFJ* 10a] the authors introduce 
the definition of alikeness between the tuples in a fragment, with respect the confi- 
dentiality constraints @’ completely covered by F; and F, (which can possibly be 
exposed by the release of a (k;-k,)grouping). Two tuples /; and /; in f; (rj, rj in f;, 
respectively) are alike, denoted |; ~ 1; (r; ~ rj, respectively), if there exists at least 
a constraint c in @’ such that Jj[eN Fy] = 1j[eN Fi] (rile Fy] = rjleNF,], respec- 
tively). For instance, r3 ~ rs in Figure 8.7, since r3 and rs assume the same value 
for attribute I11ness, which is the unique attribute in cs€@” appearing in F,. 

A group association A enjoys a degree k of protection if every tuple in A in- 
distinguishably corresponds to at least k distinct associations among tuples in the 
fragments. Practically, A is k-loose if for each group g; in the left fragment (group 
g, in the right fragment, respectively), the union of the tuples in all the groups with 
which g; (g;, respectively) is associated in A is a set that has cardinality at least k 
and that does not contain any tuples that are alike. 
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F; A F, 
DoB | ZIP | G]/G;|G,] | Ilness Treatment [G 

1, [84/07/31]26015]dz1] |dz1}itl| [Pharyngitis |Antibiotic itl} 
1b |82/05/20|26010}dz1} }dz1}it3 | |Flu Aspirin it3|r2 
1,|20/01/30]50010]dz2} |dz2]it2| | Gastritis Antacid it2|r3 
14]}80/07/02|20015}dz2] |dz2|it4| |Broken Leg|Physiotherapy |it4|r4 
1s |75/02/07|26018}dz3| }dz3}itl} |Gastritis |None itl|rs 
Ig|75/02/17|}26013]dz3} |dz3}it4] | Asthma Broncodilator |it4]r6 
17|70/05/04|26020|dz4| }dz4}it3} |Diabetes [Insulin it3|r7 
1g |65/02/08 |20010 Chemotherapy |i 


(a) (b) (c) 


Fig. 8.8: An example of 4-loose association. 


We note that there is a correspondence between the parameters k; and k, of the 
groupings and the degree k of looseness guaranteed by the group association induced 
by a (k,-k,)-grouping. It is seen immediately that a (A;,k,)-grouping cannot induce 
a k-loose association for a k > k;-k,. To guarantee that a (k;,k,)-grouping induces 
a k-loose association with k=k;-k,, in [DFJ* 10a] the authors define three hetero- 
geneity properties, proving that if the (),k,)-grouping satisfies these properties, the 
induced group association is k-loose with k=k; -k,. The heterogeneity properties can 
be summarised as follows: 


e group heterogeneity: a group cannot contain two tuples that are alike with respect 
to the constraints in @’; 

e association heterogeneity: a group in the left (right, respectively) fragment can- 
not be associated more than once with the same group in the right (left, respec- 
tively) fragment; 

e deep heterogeneity: a group in the left (right, respectively) fragment cannot be 
associated with two groups in the right (left, respectively) fragment that contain 
alike tuples. 


We note that a k-loose association is also k’-loose for any k’ < k. To maximise 
utility in data release, while satisfying the privacy requirement given by parameter k, 
it is convenient to determine the (k,-k,)-grouping that guarantees k-looseness while 
minimising the size of groups. A (k;-k,)-grouping induces a minimal group associ- 
ation if A is k-loose and there does not exist a (kj-k/.)-grouping, inducing a k-loose 
association, such that kjk). < kj k;. 


Example 8.7. Consider the (2,2)-grouping of Figure 8.7. It is minimal and satisfies 
group, association, and deep heterogeneity. As a consequence, it induces a minimal 
4-loose association, illustrated in Figure 8.8. 


Publishing loose associations provides some information on the possible combi- 
nations of tuples in the different fragments, as it restricts the possible combinations 
to those allowed by the loose associations. As shown in [DFJ* 10a], this permits 
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an increase in the precision of query evaluation, without violating confidentiality 
constraints. 


8.7 Conclusions 


The recent trend toward the public release and the outsourcing of large data collec- 
tions, possibly including sensitive information, has required the definition of novel 
protection techniques and of specific privacy metrics to assess their effectiveness. 
In this chapter, we illustrated new privacy metrics based on information theoretic 
concepts, able to measure the contribution to the risk of disclosure of each released 
record. We then described privacy protection techniques designed to protect arbi- 
trary sensitive associations through fragmentation, possibly combined with encryp- 
tion. We finally discussed how data utility can be improved by publishing a sanitised 
version of associations broken by fragmentation. 

Privacy protection in data publishing is an interesting emerging scenario. There- 
fore, many issues still need to be investigated, such as: the definition of metrics that 
take into consideration the purpose of data release; the development of solutions for 
the automatic definition of confidentiality constraints; and the consideration of data 
dependencies. 


Chapter 9 


Selective Exchange of Confidential Data 
in the Outsourcing Scenario 


Sabrina De Capitani di Vimercati, Sara Foresti, Stefano Paraboschi, 
Gerardo Pelosi, and Pierangela Samarati 


Abstract The evolution of information and communication technologies (ICTs) has 
introduced new ways for sharing and disseminating user-generated content through 
remote storage, publishing, and disseminating services. From an enterprise oriented 
point of view, these services offer cost effective and reliable data storage features 
that any organisation can take advantage of without long setup delays and capital 
expenses. Also, from an end-user point of view, distributed and shared data storage 
services offer considerable advantages in terms of reliability and constant availabil- 
ity of data. While on one hand data sharing services encourage and enhance the 
collaboration among users, on the other hand they need to provide proper protec- 
tion of data, possibly enforcing access restrictions defined by the data owner. In this 
chapter, we present an approach for allowing users to delegate to an external service 
the enforcement of the access control policy on their resources, while at the same 
time not requiring complete trust in the external service. Our solution relies on the 
translation of the access control policy into an equivalent encryption policy on re- 
sources, and on a hierarchical key structure that exploits the relationships between 
groups or users. In this way, we limit both the number of keys to be maintained 
and the amount of encryption to be performed, while keeping a good flexibility with 
respect to policy updates and revocations. 


9.1 Introduction 


Nowadays, users are resorting more and more to external services for disseminat- 
ing and sharing resources they want to make available to others. The outsourcing 
of storage and computation to external parties then promises to become a crucial 
component of future ICT architectures. Many businesses have already invested in 
this direction, meeting significant success. This evolution is justified by clear tech- 
nological and economic trends. The correct administration and configuration of 
computing systems is expensive and presents large economies of scale, support- 
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ing the centralisation of resources. This is particularly significant when consider- 
ing reliability and availability requirements, which are difficult to satisfy for fi- 
nal users and small/medium organisations. This trend towards outsourcing is tes- 
tified by the large success of different kinds of services offering remote storage and 
backup (e.g., Box.net and Mozy) and services allowing a large open community 
of users to store and exchange resources (e.g., YouTube and Flickr). These ser- 
vices assume that the service provider is completely trusted and always entitled to 
access the resources. In many scenarios, however, the service is considered honest- 
but-curious, meaning that it is relied upon for the availability of outsourced data 
but it is not authorised to access the actual data content. The solutions proposed in 
the literature to address this problem assume that the data owner encrypts her data 
before outsourcing them and communicates the encryption key only to authorised 
users [HIM02b]. While these solutions effectively provide confidentiality of the out- 
sourced data, they assume that any authorised user can access all the outsourced 
resources. Today users are more and more demanding solutions for regulating the 
publication and disclosure of their own content. To the aim of enforcing selective 
access to encrypted resources, recently novel approaches integrating access con- 
trol and encryption have been proposed [DFJ* 10c, DFJ*10b]. These approaches 
are based on selective encryption, which nicely combines data encryption with ac- 
cess control enforcement (see Figure 9.1). Selective encryption basically consists 
in encrypting different pieces of information using different keys and in selectively 
distributing encryption keys to users. Each user (which should be properly authen- 
ticated [CGPt08, GLM*t04, GPSS05]) can decrypt, and therefore access, all and 
only the data for which she knows the encryption key. We note that users in the sys- 
tem can act as both data owner and data consumer. As a consequence, to limit the 
number of keys in the system and enjoy a unique encryption policy, the approach 
in [DFJ* 10b] defines a key agreement solution, based on Diffie-Hellman technique. 
Although selective encryption guarantees access control policy enforcement, it does 
not efficiently support policy updates. To overcome this problem, novel solutions 
allow delegating to the service provider the complete management, not only the 
enforcement, of the authorisation policy [DFJ*07, DFJ* 10c]. 

In this chapter, we illustrate how access control policies can be translated into 
equivalent encryption policies, guided by the principles of releasing at most one key 
to each user, and encrypting each resource at most once. We also describe a novel 
approach for enforcing policy updates, with the goal of limiting the data owner inter- 
vention. The remainder of this chapter is organised as follows. Section 9.2 presents 
some basic concepts. Section 9.3 describes how to define an encryption policy that 
correctly enforces the access control policy defined by the data owner. Section 9.4 
presents how resources are published and accessed. Section 9.5 compares the so- 
lution illustrated in this chapter with the a PGP key-management strategy, showing 
the efficiency of our proposal. Section 9.6 analyses the security issues of our model. 
Section 9.7 illustrates how authorisation policy changes can be outsourced, while 
leaving to the data owner the control on policy management. Finally, Section 9.8 
concludes the chapter. 
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Fig. 9.1: Reference scenario. 


9.2 Preliminaries 


Let us assume a set of users Y who wish to selectively share their resources among 
themselves. Each user can act as either a data owner or a data consumer. A user 
plays the role of data owner when she makes her resources available to other users 
in the system. A user plays the role of data consumer when she requires access 
to resources owned by others. The sharing process is clearly selective since each 
resource might be accessible only to a subset of users in Y, as defined by the data 
owner. A shared resource can be modified only by its owner. Whenever a data owner 
wishes to share a resource with other users in the system, the management of the 
resource is delegated to an external service provider. The provider is trusted for 
the resource management, but it should not be allowed to either act on authorisa- 
tion policies or access the resource content (honest-but-curious service). Also, the 
service provider is supposed not to prevent any authorised user from accessing her 
resources. In the following, given a user ucY, R, denotes the set of resources for 
which u is the owner, whilst Z denotes the set of resources managed by the service. 
Notation owner(r), with rEZ, represents the owner uc Y of r. Each user u defines 
an authorisation policy to regulate access to her own resources. 


Definition 9.1 (Authorisation policy). Given a data owner u € WY, the authorisa- 
tion policy defined by u over R,, denoted P,, is a set of pairs of the form (uj,r;), 
where uj € WY and rj € Ry. 


The semantics of an authorisation (uj,rj)€P, are that data owner u has granted 
to user u; the permission to access resource r;. Given a resource ré%, with 
u=owner(r), acl(r) denotes the access control listof r, that is, the set of users that 
can access r according to the authorisation policy P,. The data owner u is always a 
member of the access control lists resulting from her authorisation policies. 


Example 9.1. Consider a system where Y@={A,B,C,D,E}, Ra={r1,r2}, Re={r3,ra}, 
Rc={rs}, Rp=Re=0. The authorisation policies defined by A, B, and C are: Py={ (A,r), 
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(B,r1), (A,r2), (B.r2), (C.r2)}; Pe={(B,rs), (D.r3), (E.r3), (Asra), (Bora), (Cora) }s 
Pc={(A,rs), (B,rs), (C,rs), (D,rs), (E,rs)}. 

According to these policies, acl(r;)={A,B}, acl(r2)={A,B,C}, acl(r3)={B,D,E}, 
acl(r4)={A,B,C}, and acl(rs)={A,B,C,D,E } 


The goal is to realise a mechanism that allows data owners to securely share their 
resources only with authorised users, while preventing even the service provider 
from accessing a resource content. To this end, we employ an encryption scheme to 
enforce an authorisation policy. Encryption is applied with two objectives in mind: i) 
the efficiency, in order to minimise the number of keys managed by each user in the 
system; ii) the correct enforcement of the authorisation policy defined by the owner, 
in order to guarantee that a resource r is accessible uniquely to the users included 
in the corresponding ac/(r). The proposed encryption scheme enables any user to 
manage a single secret and any resource to be stored in one copy through the use of 
a single cryptographic key. A few public tables are stored on the external server and 
a user requesting access to a resource has to use a protocol that selectively returns 
the corresponding encryption key. 


9.3 Encryption Schema 


A peculiar characteristic of this scenario is that there are different owners respon- 
sible for different portions of the resources publicly available. A simple solution 
for addressing this aspect and for sharing resources in a selective way consists 
in applying the approaches developed for the data-outsourcing scenario [DFJ* 07, 
DFJ* 10c], where a single owner, before outsourcing her resources, encrypts them 
with different keys and each authorised user has a key from which she can derive 
all the keys of the resources she is authorised to access. Despite being simple, these 
solutions require each user to manage a large number of keys (potentially, one key 
for each data owner). We propose a novel solution that combines two cryptographic 
techniques: a key agreement method to share a secret key between a pair of users; a 
key derivation method that employs the secret keys shared between a pair of users 
to derive all keys used for encrypting resources that they are authorised to access. 
The integration of these two techniques results in an encryption policy that correctly 
enforces the authorisation policies P,, for all uc Y (see Section 9.3.3). 


9.3.1 Key Agreement 


The key agreement method enforces a slight variation of the Diffie-Hellman (DH) 
key agreement method in such a way that two users taking part in a communication 
agree on a common secret through interacting with the external service. Our vari- 
ation of the DH method works as follows. Let (G,-) be a public algebraic cyclic 
group of prime order g=|G | and - be the internal operation of the group with 
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multiplicative notation. We assume that G is generated by an element g € Z, 
(with p = 2q+1 and p, g two prime integers) in such a way that g=|G | and 
G={g*modp: 0<e<q-—1}. Each user uEY chooses a secret integer param- 
eter e,,€[0, g— 1], computes the value g° €G, and inserts g* in a public catalogue 
managed by the external service that keeps track of the public parameters g and q. 
Whenever user uw needs to share a common secret with user u;, she can efficiently 
compute such a secret by querying the public catalog to retrieve the public parame- 
ters gi and q, and by applying the following key agreement function. 


Definition 9.2 (Key agreement function). Given a set Y of users, a set % of keys, 
and a public algebraic cyclic group (G,-) of prime order q, with generator g € G, 
the key agreement function of a user u € YW is a function ka, :Gt> # that takes 
the public parameter g°“i € G of a user u; € Y as input and returns the common 
secret between u and u; computed as: ka, (g°"i) = (g*ti)™. 


Note that according to Definition 9.2, for all pairs of users uj,uj € WY, uj A uj, 
kay, (g°"/) = kay; (gi). In the following, notation .%, is used to denote the set 
of key agreement functions of all users in Y. Intuitively, the evaluation of the key 
agreement function of a user uv with respect to any other user u; € Y returns a secret 
key that can be exploited to encrypt the resources that should be accessible only to u 
and u;. For instance, since according to authorisation policy P, in Example 9.1, re- 
source r; is accessible only to A and B, user A can first encrypt r; by using ka, (g°?) 
and can then deliver the encrypted resource to the external service. By generalising, 
a simple but at the same time inefficient solution for allowing a user u to share a 
resource r with n users u1,...,Un, Consists in computing n keys ka, (gi), and in 
creating n copies of resource r encrypted with ka, (gi), i=1,...,n. 


9.3.2 Key Derivation 


A key derivation method allows the computation of a key starting from the value 
of another key and a publicly available piece of information called token. Given a 
set # of keys and kj, kj€.%, a token t;,; between them is defined as t;,;=E,,(k;), 
where E is a symmetric encryption function.! Each user knowing k; can derive k j by 
simply decrypting ¢; ; with k;. A chain of tokens is a sequence f;/,...,t,; Such that 
t-.q directly follows tq» in the chain only if b = c. The concept of key derivation via 
chains of tokens is formally captured by the following definition of key derivation 
function. 


Definition 9.3 (Key derivation function). Given a set .% of keys, and a set 7 of 
tokens, the direct key derivation function Tt: # ++ 2” is defined as t(k:)={kj€%: 
5 1;,;¢.7}. The key derivation function t* : 4H + 27 is a function such that t*(k;) 
is the set of keys derivable from k; by chains of tokens, including the key itself 
(chain of length 0). 


' Tokens can be defined according to different strategies (e.g., [AFB05]). 
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Fig. 9.2: An example of key and token graph (a) and key assignment function (b). 


Graphically, a set % of keys and a set 7 of tokens can be represented via a key and 
token graph, with a vertex v; for each key k;€.%, and an arc (vj, v;) for each token 
ti,;¢ 7. We call root a vertex in the key and token graph that does not have incom- 
ing arcs (i.e., a vertex whose key cannot be derived via tokens). Chains of tokens 
correspond to paths in the graph, and the key derivation function t*(k) associates 
with each key ke.% the keys of vertices reachable from the vertex associated with 
k in the graph. Figure 9.2(a) illustrates an example of key and token graph, where 
notation v;; is used to denote the j-th vertex (from left to right) on the i-th level of 
the graph, and k;; denotes the key associated with vertex v;;. The root vertices of 
the graph are at level 1 (.e., v1 ;, j =1,...,6). Note that for readability of the figure, 
arrows do not appear in the graph. The graph is oriented from top to bottom. The 
definition of tokens can support the general goal of encrypting resources without in- 
troducing redundancy in their management. The idea is that whenever a resource r, 
with u=owner(r), must be accessible to n users u1,...,Un, the owner u can encrypt 
r with a key k € @ and can compute a set of tokens that each user u;, i= 1,...,n, 
can then use for deriving key k. For instance, according to the authorisation policy 
P, in Example 9.1, user A can encrypt her resource rp with a key k € -% and then 
can define two tokens, from kaa(g°*) to k and from ka,(g°C) to k, that users B and 
C, respectively, can exploit for deriving k. A key assignment function computes the 
keys used for encrypting resources. 


Definition 9.4 (Key assignment function). Given a set & of resources and a set 
HX of keys, the key assignment function 9: ++ H associates with each resource 
r€& the (single) key with which the resource is encrypted. 


Figure 9.2(b) illustrates an example of key assignment function defined over the 
resources of Example 9.1. It is easy to see that the key used for encrypting rs (i.e., 
k31) can be derived from keys ky1, k12, ki5, and kyo. 
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Fig. 9.3: An encryption policy graph. 


9.3.3 Encryption Policy 


An encryption policy regulates which resources are encrypted with which keys and 
which keys can be directly or indirectly computed by which users. Formally, an 
encryption policy is defined as follows. 


Definition 9.5 (Encryption policy). Given a set Y of users and a set & of re- 
sources, an encryption policy over WV and &, denoted @, is a 6-tuple of the form 
(YU, BK TF Kyo), where ZH is a set of keys, 7 is a set of tokens defined over 
KH, Hy is the set of key agreement functions of all users in Y, and @ is a key 
assignment function. 


An encryption policy can be represented via a graph, called encryption policy graph, 
obtained from the key and token graph corresponding to .% and 7 by adding a ver- 
tex for each user uc Y, and by adding an edge from each vertex representing u to 
vertices representing keys ka, (gi), for all u,eWY, uAu;. The vertices representing 
pairs of users are inserted in the graph if and only if there is at least one token 
starting from them. Figure 9.3 illustrates an example of encryption policy graph, 
where each vertex has been labelled with the set of users who know or can derive 
the corresponding key, thick edges represent tokens, and thin edges represent the 
computations of the key agreement functions. Note that the information about the 
users who can derive the key associated with a specific vertex does not necessar- 
ily coincide with the real identities of the users. As a matter of fact, each user can 
be identified via a pseudonym that may be selected by the user herself. Also, the 
root vertices of the key and token graph are the vertices representing the keys com- 
puted through the key agreement functions in .%,; these keys need to be directly 
computed by the users and do not exploit tokens. It is easy to see that each user u 
can directly or indirectly compute the keys associated with vertices along the paths 
starting from the vertex representing u. The first step is always a Diffie-Hellman 
computation whose resulting key is the starting point of the token chains followed 
by the user. Formally, the set of keys that a user can derive is defined as follows. 
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USER RESOURCE TOKEN 
luser_id|public res_idjowner| label jenc_res source|destination|token_value 

A gi Tr) A |AB a IAB IABC Ex, (ka) 
B gB r2 A |ABC B IAC |ABC Ex, (k21) 
Cc ge r3 B \BDE 6 IBD |BDE Ex, (k22) 
D ge? r4 B \|ABC € IBE IBDE Ex, ,(k22) 
E ge rs C |ABCDE| ¢ CD  |ABCDE Ex, (k31) 

CE |ABCDE Ex, ,(k31) 

ABC IABCDE Ex, (k31) 


Fig. 9.4: An encryption policy catalogue. 


Definition 9.6 (User keys). Given an encryption policy G=(W,4,H,7 XH), 
the set of keys that a user u € Y can compute, denoted K,,, is defined as 


Ky = Ut" (kau (g)):u,€ WY, uj Au, and ka, (gi)E H . 


Each user uw can then access any resource r such that @(r)€K,. For instance, with 
respect to the encryption policy graph in Figure 9.3, the portion of the graph that 
user A can exploit for key derivation is delimited by a continuous line and contains 
the set Ka={k11, ky2, ko1, k31} of keys she can compute. Our goal is then to translate 
the authorisation policies defined by the users in Y into a correct encryption policy 
&. The concept of encryption policy correctness is formally defined as follows. 


Definition 9.7 (Correctness). Given a set Y of users, a set # of resources, a set 
P=U,,-y Pu of authorisation policies, and an encryption policy 6=(W,2,H 7, 
Ky) over U and &, we say that & correctly enforces F iff the following condi- 
tions hold: 
We’, rEe&:o(r)EK, >(u,r)\€ FY (Soundness) 
Weu, rEB&u,r)EPY =(r)EK, (Completeness) 


The encryption policy graph in Figure 9.3 and the key assignment function in Fig- 
ure 9.2(b) represent a correct encryption policy given the authorisation policies in 
Example 9.1. To allow users to derive the keys needed for accessing the resources, a 
portion of the encryption policy must be publicly available from the external service 
responsible for resource management. This public information is represented as a 
catalogue composed of three tables: USER, RESOURCE, and TOKEN. 

Table USER contains a tuple for each user in Y and has two attributes: user_id 
is the user identifier and public is the public parameter (Diffie-Hellman public key) 
of the user. Table RESOURCE includes a tuple for each resource in # and has four 
attributes: res_id is the resource identifier; owner is the identifier of the user who 
published the resource; /abel is the label of the vertex in the encryption policy graph 
whose corresponding key is ¢(r); and enc_res is the encrypted copy of the resource, 
whose content has been preemptively signed by the data owner. 

Table TOKEN contains a tuple for each token in .7 and is characterised by three 
attributes: source and destination are the labels of the corresponding source and 
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destination vertices in the encryption policy graph; token_value is the token value 
computed as Ex,,,,,.. (Kdestination )- 

Figure 9.4 illustrates the public catalogue corresponding to the key assignment 
function in Figure 9.2(b) and the encryption policy graph in Figure 9.3. In order to 
allow users to verify the authenticity of resources, without relying on the trusted be- 
haviour of the server, we assume resources to be signed by their owner. To this end, 
we adopt the DSA signature scheme: /) Diffie-Hellman e, and g*“ parameters can 
be used as DSA private and public key, respectively; and 2) the Diffie-Hellman pub- 
lic parameters g and q (Section 9.3.1) can be chosen to satisfy the security criteria 
needed to use them also as DSA public parameters. Before outsourcing a resource 
r, owner(r) computes the plaintext digest of r by applying a cryptographic hash 
function h to the plaintext content of r. She then signs /(r) using her secret Diffie- 
Hellman parameter e,,, and encrypts both r and its signature. A user accessing 7 can 
check the signature of the resource, after its decryption, by using the public Diffie- 
Hellman parameter g® of owner(r) and the publicly available cryptographic hash 
function h. 


9.4 Resource Sharing Management 


The defined approach provides the users with the functionality for publishing and 
accessing resources. The publish functionality allows data owners to compute the 
digest, sign, and correctly encrypt their resources prior to delivering them to the 
service provider. Each user u can operate on a subset of the whole encryption pol- 
icy graph. As a matter of fact, u is able to first create and then use token chains 
whose starting points are the root vertices corresponding to keys that uw can compute 
through Diffie-Hellman computations (i.e., root vertices representing u and another 
user in the system). Therefore, whenever user uw needs to share a resource r with 
other users in the system, she must first encrypt r with a new key and then must add 
the appropriate tokens that the other users in acl/(r) can use to derive the new key. 
The creation of these new tokens can exploit token chains previously created by u to 
reach vertices representing a subset of the users in ac/(r). Note that if there already 
exists a key only derivable by users in ac/(r), u simply computes such a key through 
the appropriate token chain and then encrypts r with the derived key. 


Example 9.2. Figure 9.5 illustrates the evolution of an encryption policy graph fol- 
lowing a sequence of publish operations. There are five users, Y={A, B, C, D, E}, 
and at the initial state, the authorisation policies already enforced through the hier- 
archy are: P4 ={(A, 11), (B, ri), (A, r2), (B, r2), (C, ro) }, Pp ={(B, r3), (D, r3), (E, 
r3)}. Upon each publication, if there is no key for the involved acl, a new key is 
generated together with the tokens allowing derivation of the key from all the users 
in the acl. The figure also reports the keys that must be computed between pairs of 
users for ensuring such a derivation. 


190 S. De Capitani di Vimercati, S. Foresti, S. Paraboschi, G. Pelosi, P. Samarati 


Initial State: Pa={(A,r1),(B,r1),(A,r2),(B,r2) (C, r2)}, 
Pg={(B,r3),(D,r3),(E,r3) } 
Request from B: Publish(r4,B,e3,{A, B,C}) 
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Fig. 9.5: A sequence of publishing operations. 


The access functionality allows users to retrieve the resources that they are autho- 
rised to access and to verify their signature. In particular, every time an authorised 
user u needs to access a resource r, the service has to deliver the encrypted resource 
to u along with a token chain ending at the vertex representing ac/(r), which the 
user follows to derive the decryption key. User u can then decrypt the resource and 
use the public Diffie-Hellman parameter of owner(r) to verify the signature of r. 


Example 9.3. Consider the state resulting from the sequence of operations of Exam- 
ple 9.2 and the request of user B to access ro. Since BEacl(r2), the access function 
extracts ro* from table RESOURCE, finds the shortest path from v1; to vz (ie., the 
vertex whose key is used for encrypting rz), and derives kz; from k1;. The resource 
ry* is decrypted using k, and the resulting signature is verified through the public 
parameter of the resource’s owner A. 
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9.5 Comparison with the PGP’s Key-Management Strategy 


An alternative solution for enforcing an authorisation policy via encryption could 
be based on a PGP-based approach.” Following this model, each user in the sys- 
tem has a public/private key pair and each resource r is encrypted with an arbitrary 
symmetric key k. For each user u in acl(r), owner(r) encrypts k with the public key 
of u. Both the encrypted resource and the encrypted copies of its encryption key 
are stored at the storage server. To access resource r, a user u€acl(r) needs first to 
decrypt the symmetric encryption key & with her private key and then to decrypt the 
resource with k. Despite being simple, this approach does not scale well since a re- 
source shared among a large community has to be extended with a large descriptor, 
containing the resource key encrypted with the public key of every user authorised 
to access the resource; while our solution exhibits lower storage usage. Suppose the 
encoding of any ac/ requires 4 bytes, the size of a symmetric key is 128-bit, and a 
symmetric key is encrypted with a 1024-bit public key thus obtaining an encryption 
block of 128 bytes. A PGP-based solution requires the addition of a new column in 
table RESOURCE for storing the resource descriptors. The additional storage space 
necessary to maintain these resource descriptors is 128)". g|acl(r)| bytes. Our ap- 
proach instead requires the storage of table TOKEN, which is not needed with PGP. 
Since the size of each row of table TOKEN is 24 bytes (4 bytes for attribute source, 
4 bytes for attribute destination, and 16 bytes for attribute token_value, which corre- 
sponds to the ciphertext obtained by encrypting a symmetric key with a symmetric 
algorithm), the total storage space required for table TOKEN is 24|.7| bytes, where 
|.7| corresponds to the number of edges in the key and token graph. In the worst 
case, each resource in & has a different access control list composed of more than 
two users thus implying at most |#| non-root vertices in the graph. Since each non- 
root vertex v representing a set acl of users, has at most |ac/| —1 incoming edges, the 
storage space required by table TOKEN is less than 24)". glacl(r)| bytes, which 
is 5.3 times less than the space required by a PGP-based solution. Consider now a 
system where acl(r;)={u,...,uj+1} and owner(r;)=u1, i = 1,...,|@|. This configu- 
ration is the best case in terms of the storage space required for our approach: the 
space for storing the resource descriptors is 128(|#|+ 1)(|#| +2)/2 bytes and the 
storage space required for table TOKEN is 24|.7|=24(2|Z]|) bytes since each ver- 
tex in the key and token graph has 2 incoming edges, which is clearly less than the 
space needed with a PGP-based solution. Another drawback of a PGP-based solu- 
tion is that requiring each resource to be associated with its own descriptor does not 
allow for exploiting the fact that different resources might have the same access con- 
trol list. The management of key updates is also difficult, since a public key update 
would require recomputing all of the descriptors of the corresponding resources. 

In our solution, this operation would instead require the update of tokens corre- 
sponding to the arcs outgoing from a few root vertices of the key and token graph. 


2 IETF—OpenPGP Working Group, RFC 4880, http: //www.openpgp.org/ 
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9.6 Exposure Evaluation 


Threats in key-agreement or key-distribution schemes are realised by an adversary 
who may eavesdrop, replay, or substitute messages that are transmitted over a com- 
munication channel. In the considered scenario, each user accesses the storage ser- 
vice from different and independent locations, and the security threats are due to 
a subject (service or user) that aims at accessing resources with acls that do not 
include her. The encryption is assumed to be robust, thus improper accesses to a 
resource by unauthorised parties can only happen if a party improperly acquires the 
key with which the resource is encrypted. Impersonation of the service is a threat ex- 
cluded from the analysis since the communication between the user and the service 
is assumed to be over an SSL channel. Also, the key derivation method is assumed to 
be secure since a user knowing a key k; associated with a vertex vj cannot compute 
keys associated with vertices that are not descendants of v; in the encryption policy 
graph [AFB05, ADFMO6]. The way a malicious party can get an access key is by 
masquerading as a legitimate user so that other users in the system are provided 
with the public DH parameter of the malicious party instead of the correct one. The 
problem of users claiming an identity they do not own (e.g. phony celebrity pages 
in Facebook and MySpace) lies outside of the technical realm. We are instead inter- 
ested in the technical problem of preventing the service from behaving maliciously, 
thereby compromising the confidentiality of resources by presenting DH parameters 
that the service controls. The traditional techniques to overcome this threat consist 
in exchanging the public DH parameters adopting /) an off-band user-to-user com- 
munication on a trusted channel, or 2) a solution involving one or more certification 
authorities. These techniques are well known and robust, but they are also known 
to be costly when each user of the system has to rely on them for their security. 
Also, an identity-based approach [BF03], despite being certificate-less, does not fit 
the requirements for our scenario because it basically shifts the user’s trust from the 
storing service provider to a centralised Key Generation Center. 


9.6.1 Anonymous Accesses 


A novel and inexpensive possibility is based on the ability of users to query the 
public table USER anonymously. The service provider can easily monitor the com- 
munication channels employed by users and modify the content of the public tables. 
While the service provider appears extremely powerful, we assume the provider has 
a strong incentive to have a good reputation among users, since it is sufficient for a 
limited number of users to report improper behaviour of the service to have all other 
users lose confidence in it. According to this observation, our approach is based on 
the execution of random checks, by the users, on the honest behaviour of the ser- 
vice. Suppose a service provider S behaves maliciously with the goal of accessing a 
resource that a legitimate victim B is entitled to access. The attack can be directed 
to resources owned by a particular user A or to all resources that B could access. Let 
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Fig. 9.6: Man-in-the-middle attack. 


us consider first the attack aimed at acquiring the resources of a specific user A, and 
then generalise the treatment. To act as a man-in-the-middle (MitM), masquerading 
as B with respect to A, upon A’s request to retrieve the public parameter of B (i.e., 
g°8), the service S$ will have to respond with a fake public parameter g°b. Then, S 
will be able to derive the common key between B and A, and therefore access re- 
sources whose acl is equal to, or contains {A, B}. However, to avoid being detected 
by B, the service should ensure B’s ability to access the resources that A wishes to 
share with B. This forces S to encrypt the resources also with a key that both S and 
B can compute. Hence, S will have to masquerade as A for B and will have to: /) 
sign a copy of A’s resource (which S can now acquire) with a fake Diffie-Hellman 
private parameter e’, (instead of the genuine e,); 2) encrypt the copy of A’s resource 
and its signature to be made accessible to B; 3) provide to B the fake public param- 
eter gia allowing B to verify the signature and determine the key with which the 
copies of the resources have been encrypted; 4) create a new copy of all the tokens 
that were supposed to originate from the authentic common key between B and A 
so that they originate from the fake key (actually agreed between B and the service). 
These tokens are needed to ensure B will be able to access all resources whose acl 
includes {A, B}. The MitM attack implies that the service will always have to re- 
turn the fake B’s public parameter gb to A, and the fake A’s public parameter ga 
to B, while instead returning the correct g“8 and g“ to other users. Note that due to 
the symmetric nature of the attack, by aiming at acquiring access to A’s resources 
accessible to B, the service also acquires access to B’s resources accessible to A. 
Note also that this symmetric behaviour makes the attack applicable only to pairs 
of users that have never shared resources before. If the service aims at accessing all 
the resources to which B has access, S should mount a similar attack for all the other 
users in the system. Figure 9.6 illustrates an example of an encryption policy graph 
in the case of a malicious service mounting a MitM attack between users A and B. 
Our protection relies on two easily enforceable assumptions. First, we assume that 
public parameter requests to the service can be made anonymously (e.g., via a proxy 
or a mixing protocol [DMS04]), so that the service will not be able to infer which 
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user is submitting the request. Second, we assume users can randomly query the 
service for their own Diffie-Hellman public key or for other keys they already know. 
With respect to our example, the service will not know if the request for B’s public 
parameter comes from a user different from A, including B (and therefore it should 
respond g°8), or from A (and therefore it should respond gb). While the service can 
try to guess the source of the request, it is reasonable to expect a non-negligible 
probability p,, of wrong guess. It is then possible to put constraints on the number 
of attacks that the service would be able to realise without being detected; with a 
simple statistical model, we obtain that [1/p,,| checks will be sufficient to detect 
with at least 1 — 1 /e probability (0.632) the illicit behaviour.* The probability of the 
attack being detected quickly increases with the increase of the number of anony- 
mous retrievals of keys. Hence, since the service has a strong incentive to keep its 
reputation intact, it is clearly driven to avoid the MitM attack. For systems aiming 
to serve a large community of users, we expect this protection to be able to offer a 
high degree of robustness. 


9.7 Encryption Policy Updates 


The encryption policy described in the previous sections assumes that keys and to- 
kens are computed on the basis of existing authorisation policies before sending the 
encrypted resources to the server. While the authorisations set at initialisation time 
might not be changed too frequently, many situations require dynamic alterations 
to it in order to grant or revoke privileges to either new or old users, respectively. 
Therefore, every time an authorisation on a resource r is granted or revoked, acl(r) 
changes accordingly. In terms of encryption policy, this mandates a change of the 
key used to encrypt the resource, so that it will be accessible only to users in the 
modified acl. This operation requires decrypting the resource (with the key with 
which it is currently encrypted), to retrieve the original plaintext (since the owner 
may not keep a local copy of her outsourced data), and then re-encrypt it with the 
new key. Such an overhead, in terms of both communication and computation, for 
managing authorisation changes does not fit current scalability needs. Therefore, 
we defined mechanisms to outsource the also evolution of the authorisation poli- 
cies. Note that this delegation is possible since the server is considered trustworthy, 
i.e., it is assumed to properly carry out the service, albeit it is not trusted to respect 
data confidentiality. The solution, called Over-encryption [DFJ*07, DFJ* 10c], en- 
forces policy changes on the encrypted resources themselves without the need of 
decrypting them, and may thus be performed by the server. The security goal con- 
sists in minimising the server gain in colluding with some user to get access to data 
owned by others. 


3 If py is equal to 1/n, the probability for the server of guessing n times is (1 — 1/n)", which is a 


known mathematical series approximating e~!. 
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9.7.1 Two-Layered Encryption Model 


The encryption policy model is enriched with two layers of encryption: 


A Base Encryption Layer (BEL) is applied by the data owner before transmitting 
data to the server in order to enforce the access policy existing at initialisation 
time, as described in the previous sections. 

A Surface Encryption Layer (SEL) is applied by the service provider, over the re- 
sources already encrypted by the data owner, to enforce the dynamic changes 
over the policy. 


Each user (as both data owner and data consumer) remains directly responsible only 
for her own Diffie-Hellman secret key, while the enforcement of the encryption pol- 
icy requires her to perform, at SEL level, a key agreement with the service provider, 
and at BEL level, a non-interactive key agreement with the target data owner. In or- 
der to access the target resource, each user has to encrypt (respectively decrypt) the 
resource twice: firstly with the unique access key at SEL layer, and secondly with 
the key corresponding to the unique access key at BEL layer. 

Two basic approaches can be followed in the construction of the two levels, 
called Full-SEL and Delta-SEL, having different performances and protection guar- 
antees [DFJ+07, DFJ*10c]. 


Full-SEL. With this mode of encryption, the SEL policy is set to mimic the BEL 
policy: for each derivation key in BEL, a corresponding key is defined in SEL, 
and for each token in BEL, a corresponding token is defined in SEL. 

Delta-SEL. With this mode of encryption, when a resource is uploaded for the first 
time, the SEL level does not add any additional protection, while enforcing a 
double encryption only when a change in the existing authorisation policies of a 
data owner is requested. 


An approach where the authorisation enforcement is completely delegated at the 
SEL level, whilst the BEL one simply applies a uniform over-encryption to pro- 
tect the plaintext content from the server’s eyes (i.e., each data owner selects only 
one key, and distributes it to all the data consumers she wants to share her resource 
with), would present a significant exposure to collusion attacks. The Full-SEL al- 
ways requires double encryption to be enforced (even when authorisations remain 
unchanged), thus doubling the decryption load of users for each access. However, 
in terms of efficiency, the use of a double layer of encryption, adds only a negligible 
computational overhead [DFJ*10c]. By contrast, the Delta-SEL approach requires 
double encryption only when actually needed to enforce a change in the authorisa- 
tions. The reasons for choosing one of the two modes of operation over the other, 
are related to both the collusion threats and the data owner inferences on the gran- 
ularity of the access control policies imposed on the resources she wants to share. 
Moreover, the assumptions on the evolution over time of these policies play a key 
role in the decision (see Section 9.7.3). 
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9.7.2 Over-Encryption 


For each data owner O, the grant and revoke policy alterations are enforced through 
a two-layer encryption system, embodied in the over-encrypt(Ro, acl(Ro)) prim- 
itive, where Ro is the set of resources that must be accessible only to users in the 
corresponding access control list, acl(Ro). The over-encrypt primitive manages the 
proper resource confinement through the SEL layer in order to deny the access to 
a resource to a certain subset of users, while it employs the BEL layer in order to 
establish which users are allowed to access it. This results into two independent ac- 
cess control actions, which are applied synergically in order to compose the desired 
access restrictions. For instance, if two resources r, r’ share the same access con- 
trol list (and thus the same BEL key), granting the access to r to a new user (i.e., 
allowing her to access r through the related BEL key) will result into gifting her 
with the right to access r’. In order to compensate this undesired privilege, the SEL 
encryption layer is updated to deny the access to r’ to the new user, through over- 
encrypting r’ with a new key, which can be derived only by the legitimate members 
of acl(r’). Consequently, revoking the access rights to a resource r’ for a specific 
user is as simple as removing the user from acl(r’’) and subsequently employing the 
SEL encryption layer in order to prevent her from accessing r”. 

Depending on whether the Full- or Delta-SEL mode of encryption is employed, 
the over-encryption operation is performed in two different ways. In the Full-SEL 
case, the SEL encryption layer is employed to deny the access to a resource r to all 
the users but the ones in acl(r). This is done regardless of the users being allowed 
or not to access r, through being able to derive the related BEL key. By contrast, 
in the Delta-SEL approach, the locking effect of the SEL layer is employed only 
in order to limit the access rights granted by the BEL keys owned by the no longer 
authorised users [DFJ*07, DFJ* 10c]. 


9.7.3 Collusion Evaluation 


We assume both the key derivation functions and the encryption primitives are se- 
mantically and provably secure [AFB05], even when combining the information 
available to many users. Moreover, we assume that each user correctly manages her 
keys. It still remains to evaluate whether the approach is vulnerable to attacks from 
users who access and store all information offered by the server, or from collusion 
attacks, where different users (or a user and the server) combine their knowledge 
to access resources they would not otherwise be able to access. Note that for collu- 
sion to exist, both parties should benefit from it, otherwise they will not have any 
incentive in colluding. In order to model the information leakage, we assume that 
users are not oblivious (i.e., they have the ability to store and keep indefinitely all 
information they were entitled to access). In order to examine the different views 
that each user can have on a resource 7, we employ a graphical notation with re- 
source r in the center and with fences around r denoting the barriers to the access 


9 Selective Exchange of Confidential Data in the Outsourcing Scenario 197 


Server’s view User’s view 
(avcaenae SEL||_ $s EL SEL SEL... SEL 
BEL | || BEL | BEL BEL || BEL | 
' Ki -Resess 1! ios I ' | 
' \ ' 1 I H 1 I ' \ 
r it t pt ft r 1 ret tt r 
' ih 1 I | 1 I ' | 
' Mh 1 | ! 1 | ' ! 
H (|, oT eerec- 1; oS |] &eeeee H I 
Dcececcedualllllawaunieces! | 
open locked SEL-locked BEL-locked 
(a) (b) (c) (d) (e) 


Fig. 9.7: Possible views on resource r. 


imposed by the knowledge of the keys used for r’s encryption at both the BEL level 
(inner fence) and the SEL level (outer fence). The fence is continuous if there is no 
knowledge of the corresponding key and it is discontinuous otherwise. Figure 9.7(a) 
shows the view of the SEL server itself, which knows the SEL-level key, but does 
not have access to the BEL-level key. On the right, the open view corresponds to 
the view of authorised users, while the remaining ones (Locked, SEL—locked, 
BEL-—1locked) show the views of non-authorised users. 


Collusion can take place every time two entities, combining their knowledge (i.e., 
the keys known to them) can acquire knowledge that neither of them has access to. 
Therefore, users having the open view need not be considered as they have nothing 
to gain in colluding (they already access r). Following the same line of reasoning, 
users having the Locked view will not be considered, since they have nothing to 
offer. In the Full-SEL approach, no one but the server can have a BEL—locked 
view, while only a user can have an SEL—locked view. This describes the only 
possible threat of collusion, because the knowledge of the server allows for lowering 
the outer fence, while the knowledge of the user allows for lowering the inner fence. 

There are only two reasons for which a user can have the SEL—locked view 
on a resource. /) The user was previously authorised to access the resource and the 
authorisation was then revoked. In this case, since the user is supposed to be non 
oblivious, she has no gain in colluding with the server. It is therefore legitimate to 
consider this case ineffective with respect to collusion risks.* 2) The user has been 
granted the authorisation for resource 7’ that was, at initialisation time, encrypted 
with the same key as r (i.e., acl(r’) C acl(r)), leaving r SEL-locked [DFJ*07, 
DFJ* 10c]. In this situation (r from Locked to SEL-locked), the user has never 
had access to r and must not be able to gain it, therefore there is indeed exposure to 
collusion. 


The Full-SEL approach provides superior protection, as it reduces the risk of expo- 
sure, which is limited to collusion with the server. By contrast, the Delta-SEL ap- 
proach exposes also to single (planning-ahead) users. Therefore, each data owner, 
when choosing between the use of Delta-SEL or Full-SEL, should prefer the first 


4 We assume, without loss of generality, that any time a resource is updated, the data owner encrypts 
it with another BEL key as it were a new one 
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one when it is likely that: her access policy will be relatively static, sets of resources 
sharing the same ac/ at initialisation time represent a strong semantic relationship 
rarely split by policy evolution, or resources are grouped in the BEL in fine gran- 
ularity components where most of the BEL nodes are associated with a single or 
few resources. Indeed, in these situations, the risk of information leakage due to 
collusion is limited also in the Delta-SEL approach. By contrast, if authorisations 
have a more dynamic and chaotic behaviour, the Full-SEL approach may be pre- 
ferred to limit exposure due to collusion (necessarily involving the server). Also, 
the collusion risk may be minimised through a proper organisation of the resources 
to reduce the possibility of policy splits. This could be done either by employing a 
finer encryption granularity and/or better identifying resource groups characterised 
by a persistent semantic affinity (in both cases, using in the BEL different keys for 
resources with identical acl). 


9.8 Conclusions 


The outsourcing to honest-but-curious service providers of large data collections 
requires the definition of novel access control systems, to support the selective re- 
lease of outsourced information to authorised users. In this chapter, we illustrated a 
novel approach combining authorisations and encryption to enforce access control 
policies defined by different data owners, who want to selectively share their re- 
sources. Access control policies translate into equivalent encryption policies, such 
that each user can decrypt all and only the resources she is authorised to access. We 
also described a solution that allows data owners to outsource to the service provider 
possible updates to the access control policies, by nicely combining two layers of 
encryption. We note that the access control system introduced in this chapter can be 
easily integrated with the proposal in [DFJ*08], to protect the confidentiality of the 
policy if it needs to be kept secret. 
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Introduction 


PrimeLife has the vision of bringing sustainable and user-controlled Privacy and 
Identity Management to future networks and services. User-controlled Privacy and 
Identity Management implies that users can make informed choices about the selec- 
tion of appropriate (anonymous) credentials for proving personal properties, the re- 
lease of personal data items, as well as the selection and adaption of privacy and trust 
policies. For enabling well-informed decisions, the user interfaces have to present 
information about the trustworthiness of communication partners. Additionally, in- 
formation about the privacy implications of those choices in terms of the linkability 
of the user’s partial identities and of their communication partners’ data handling 
policies should be displayed. This information about choices and their implications 
must be well understandable, noticeable and must comply with legal requirements. 
At the same time, it should not be perceived as too interfering or disturbing. Actions 
needed for performing choices should also be easy to handle. The challenges of re- 
searching and developing user interfaces for Privacy and Identity Management that 
are intuitive, user-friendly and compliant with legal and social requirements have 
been addressed by PrimeLife Activity 4 (HCD. 

In this part of the book, we will present the HCI research within PrimeLife, which 
aims in particular at addressing the following research challenges: 


e Available system usability scales and questionnaires for measuring user experi- 
ences and usability of various HCI aspects do not address PET related issues. A 
special focus of the research work of the HCI work within PrimeLife has there- 
fore been on the development of novel methodologies for evaluating HCI solu- 
tions for PETs that can be used within PrimeLife. Chapter 10 of this part presents 
PET-USES (Privacy-Enhancing Technology Users Self-Estimation Scale), which 
we have developed and used within PrimeLife to evaluate PrimeLife user inter- 
faces. 

e Privacy-enhancing technologies (PETs) are based on complex technical concepts 
or constructs such as pseudonyms, unlinkability and anonymous credentials that 
are unfamiliar to many end users and often do not fit to their mental picture of 
what belongs to an electronic identity and how it can be technically protected. 
How can a notion about privacy and electronic identity be illustrated to the user 
for estimating the risk of being identified across different interactions with one or 
several communication partners? How can the user be assisted in understanding 
and taking advantage of the privacy-enhancing features of PrimeLife technolo- 
gies? Chapter 13 on HCI for PrimeLife Prototypes presents the HCI work done in 
terms of development and testing of PrimeLife prototypes developed within the 
PrimeLife Activities 1 and 2 (described in Parts I and II of this book). In Chap- 
ter 12 (The Users’ Mental Models’ Effect on their Comprehension of Anonymous 
Credentials), we analyse what effects the users’ mental models have on their 
understanding of the selective disclosure property of anonymous credentials. 

e How can the user interfaces mediate reliable trust in PrimeLife technology and 
communication partners to end users? For addressing this problem, we have con- 
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ducted research on the design of a user-friendly trust evaluation function that 
can convey information stated or certified by third (trustworthy) parties to end 
users about the level of trustworthiness of communication partners. User-friendly 
transparency tools are also a means for enhancing the users’ trust in PrimeLife 
technologies. Within PrimeLife, we have developed and tested the data track, 
which is a transparency tool that provides the user with a history function doc- 
umenting what personal data the user has revealed under which conditions. The 
data track also includes online functions allowing a user to exercise her right 
to access her data on remote services sides. Chapter 13 on Trust and Assurance 
HCT reports on the HCI work for the trust evaluation function and PrimeLife data 
track. 

How can privacy and trust policy definitions, administration and negotiations be 
simplified for end users by appropriate means for policy presentation, predefined 
settings and automation in a way compliant with European privacy legislation? 
In PrimeLife, we have researched novel concepts for a simplified management 
of privacy preferences and user-friendly display of data handling policies of ser- 
vices sides including information about how closely they match the user’s privacy 
preferences. Results of our research on policy-related HCI aspects are reported 
in Chapter 14 on HCI for Policy Display and Administration and Chapter 15 on 
Privacy Policy Icons. 


Chapter 10 
PET-USES 


Erik Wastlund and Peter Wolkerstorfer 


Abstract This chapter describes the PET-USES [Privacy-Enhancing Technology 
Users’ Self-Estimation Scale], a questionnaire that enables users to evaluate PET 
User Interfaces [UIs] for their overall usability and to measure six different PET 
aspects. The objective of this chapter is to outline the creation and the background 
of the PET-USES questionnaire and invite the usability community to not only use 
the test but also contribute to the further development of the PET-USES. This text 
is an excerpt of [WWK10] which additionally contains a more elaborate description 
of the rationale behind the PET-USES. 


10.1 Introduction 


PET-USES [Privacy-Enhancing Technology Users’ Self Estimation Scale] impor- 
tant thing is a questionnaire that enables users to evaluate PET-User Interfaces [UIs]. 
The reason for developing and using PET-USES was to be able to measure the 
perceived usability of UIs, both during single user trails and during large group 
walkthroughs of screen recordings. Although there are a number of questionnaires 
measuring user experience, usability and various HCI (human-computer interaction) 
aspects such as the hedonic quality [HBK03] of both software and websites [Bro96] 
[TS04], to our knowledge none include PET-related issues. 

The PET-USES consists of two major parts of questions: one part measuring 
overall usability and one part measuring PET-aspects. Thus, the PET-usability scales 
have a dual purpose. They evaluate the software’s general usability and the extent 
to which the software assists the user in learning and understanding privacy-related 
issues. The PET-USES questionnaire consists of the following modules (the detailed 
content can be seen in the Appendix): 

Part I — Usability: 


e General Usability 
e Ease of Learning 


J. Camenisch et al. (eds.), Privacy and Identity Management for Life, 213 
DOI 10.1007/978-3-642-20317-6_10, © Springer-Verlag Berlin Heidelberg 2011 
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e Ease of Use 
e User Value 


Part II — PET-related aspects: 


Data Management 
Credential Management 
PrivPrefs! 

Recipient Evaluation 
Data Release 

History 


An important feature of the measurement of PET-aspects is the modularity of the 
questionnaire, enabling the inclusion or exclusion of scales measuring specific as- 
pects based on the tasks and features being evaluated, e.g., dependent on the context 
of use, the Credential Management part could be excluded from the questionnaire. 

The PET-USES questionnaire is based on the ISO 9241 general standard of us- 
ability [ISO88] as well as the more domain specific HCI guidelines presented by 
Patrick et al. 2003 [PKHvB03] and utilised in the work with the PRIME integrated 
IDM prototype [Pet05]. The former defines usability as the “extent to which a prod- 
uct can be used by specified users to achieve specified goals with effectiveness, 
efficiency, and satisfaction,’ whereas the latter promotes the four categories com- 
prehension (to understand or know), consciousness (be aware or informed), control 
(to manipulate or be empowered) and consent (to agree). Although the two views 
might seem divergent at first, they can readily be combined within the structure of 
usability testing proposed by Hornbek [Hor06]. Based on a review of 180 studies, 
published in core HCI journals and proceedings, he argues for a change in terminol- 
ogy from the ISO 9241, to better encompass what is actually being measured. 

Effectiveness and efficiency are often measured in a more objective fashion than 
the user self estimations of the PET-USES. The effectiveness of a given interface 
can for instance be measured in terms of task completion time and efficiency in 
terms of quality of task solution [FHH00] and, of course, optimally usability evalu- 
ations should be comprised of a combination of self estimation and more objective 
measurements. It should, however, be pointed out that these types of measurement 
require fully functional interfaces, whilst the PET-USES can be used in a much ear- 
lier stage to measure users’ perception as estimates of effectiveness and efficiency. 

The PET-USES scale General Usability is measured as a composite of the sub- 
scales Ease of Learning, Ease of Use and User Value. The rationale for differen- 
tiating between the subscales Ease of Learning and Ease of Use is that intuitive 
interfaces are perceived to have a better learnability whereas a less intuitive inter- 
face can be used easily only once the user is accustomed to it. It is also noteworthy 
that the General Usability value will be less influenced by perceived User Value than 


' PrivPrefs (Privacy Preferences) is a method that is currently being investigated in the PrimeLife 
project for defining personal privacy preferences which will be used for automated evaluations of 
the appropriateness of data-requests. The PrivPrefs are similar to the privacy preferences as defined 
in P3P (www.w3.org/P3P). 
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Ease of Learning and Ease of Use. This reflects the fact that, although user value is 
an important driver for software adoption, the focus lies more on the usability of a 
product than its perceived benefits. The PET-aspects modules currently developed 
are: Data Management, Credential Management, PrivPrefs, Recipient Evaluation, 
Data Release, and History. They can all be used to evaluate specific PET-related 
functionality of software or websites. (See appendix for the entire PET-USES ques- 
tionnaire. 

The focuses of the scales are the following privacy-critic areas: 


e Data Management: The extent to which the system makes it easier to store and 
organise personal information. This scale can be used to evaluate all types of 
identity management software and services. 

e Credential Management: The extent to which the system makes it easier to store 
and organise certificates and credentials. This scale can be used to evaluate iden- 
tity management systems that include issued claim credentials (e.g., the Higgins 
project’). 

e PrivPrefs: This scale is designed to measure the extent to which the system makes 
it easier to set general and excessive levels for data release policies and to what 
extent the user is informed of unwanted data dissemination. Thus, an aspect of 
this scale is the decision support qualities of the system. 

e Recipient Evaluation: the extent to which the system helps users evaluate the 
data recipients’ credibility and trustworthiness. This scale can also be regarded 
in terms of decision support. 

e Data Release: The extent to which the system clarifies what personal information 
is being released and who is the recipient of the data. 

e History: The extent to which the system can show the user when, what and, to 
whom personal information has been released and thus provide an overview of 
what data any given service provider might have accumulated. 


10.2 PET-USES in Practice 


So far, the PET-USES has been used in a few different settings for different pur- 
poses. However, even though it has been used both by researchers and commercial 
activities outside the PrimeLife, to the best of our knowledge the main usage has 
been to evaluate the PrimeLife prototypes. Based on our own experience and feed- 
back from others, the modularity of the PET-USES is, much appreciated feature as 
this allows the test to be tailored to the fit the test scenario. 


2 www.eclipse.org/higgins/ 
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10.2.1 When to use the PET-USES 


The main reason for conducting usability tests is to discriminate between usable and 
unusable interfaces either during the design process or in comparisons between dif- 
ferent systems. Typical use cases for the PET-USES include both of these scenarios. 
The PET-USES can be used both to compare the perceived usability strengths and 
weaknesses between different interfaces, and also to aid interface designers during 
the design process. The latter by administrating the test at various steps in the pro- 
cess. However, as with all statistical testing, the possibility of finding significant 
results is dependent on the a-priori power of the investigation. When it comes to 
comparing existing interfaces, a bigger effects size can be achieved both by choos- 
ing and comparing interfaces that are very good with interfaces that are very bad or 
by enrolling more users into the test. During interface design, especially during fast 
iterations, the differences between versions are usually quite small and the tested 
user group rather small and hence the power of a test such as the PET-USES will 
become quite small. This should be taken into consideration when planning when 
to use the PET-USES, as it will be more useful when evaluating clear steps in the 
design process. In order to gain power by adding more respondents without hav- 
ing to do a great number of complete user tests, it is possible to do large group 
walkthroughs of screen recordings. An additional feature of this method is that it is 
possible to do user tests on interfaces without any functionality. 


10.2.2 How to use the PET-USES 


In order to facilitate both the use and the evaluation of the PET-USES, a web ser- 
vice has been set up at CURE. The web service enables research companies to 
use the PET-USES questionnaire for their evaluations and will be open to all who 
wish to use the PET-USES on the premises that the collected PET-USES data will 
be used to gather feedback and further develop the questionnaire and its scales. In 
addition to using the scales of the PET-USES, researchers in this area will have the 
possibility to suggest new modules for inclusion in the sub-scale battery to reflect 
the ever changing field of PETs. Data provided on the website will be anonymised 
and treated confidentially. Only those conducting the research and the creators of 
the PET-USES (1.e., Karlstad University and CURE) will have access to the data 
provided. Users of the site who wish to retain data from other sources than the PET- 
USES are of course allowed to do so, but in order to evaluate the PET-USES, users 
are encouraged to provide data that can facilitate the validation of the test. 


3http://pet—uses.cure.at 
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10.3 Conclusions 


The PET-USES test presented in this chapter is a usability questionnaire that focuses 
on measures of aspects of General Usability as well as specifically tailored scales 
that measure the usability of PET solutions. The test is grounded in current views 
on usability and the experience so far of using the test show that both practitioners 
and users report that the PET-USES is an easy to use and informative tool. The 
CURE web service for using the PET-USES is open to PET researchers who wish 
to evaluate PET UIs. Our hope is that it will be a part of the PET development and 
evaluation toolkit used by usability researchers interested in the area of PETs. 


10.4 Appendix: PET-USES[1.0] 


Note: the headings and numerals in the following test are mainly for presentational 
purposes and thus optional during the use of PET-USES. Items 2, 3, 7, 8, and 21 
should be reversed before summated. 


10.4.1 Instructions 


This test is designed to measure your experience with the system you’ve tested to- 
day. Your answers will be used to evaluate the system so please answer the questions 
as truthfully as you can. As the questions are designed to measure various aspects 
of the systems usability there are no right or wrong answers. Please use the scale 
below to indicate to what extent you disagree or agree to the statements that follow. 


1 Strongly disagree 

2 Disagree 

3 Neither agree nor disagree 
4 Agree 

5 Strongly agree 
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General usability 


1. I found it easy to learn how to use the system. 12345 
2. [had to learn a lot in order to use the system. 12345 
3. [keep forgetting how to do things with this system. 12345 
4. Ineed a lot of assistance to use this system. 12345 
5.I find the system interface easy to use. 12345 
6. I find the organisation of the system interface understandable. 12345 
7.1 get confused by the system interface. 12345 
8. I find it very difficult to work with the system. 12345 


9. I find that the benefits of using the system are bigger than the 12345 
effort of using it. 


10. I would like to use this system regularly. 12345 
Data management 

11. I get a clear view of my personal data from the system. 12345 
12. I find organising my personal data easy with this system. 12345 


13.1 find keeping track of various user names and passwords is easy 1 23 45 
with this system. 

Credential management 

14. I find it easy to add personally issued credentials into the system. 123 45 


15. I find it easy to add / import certificates into the system. 12345 
16. I find it easy to manage my certificates and credentials. 12345 
PrivPrefs 


17. I find it easy to use settings for how much or how little data to 12345 
be released. 

18.1 find that the system helps me understand the effects of different 1 2 3 45 
privacy settings. 

19. I feel safer knowing that I will be notified if I’m about to release 1 2 3 45 
more data then my chosen preference. 

Recipient Evaluation 

20. The system makes it easy to decide if it is safe to release my data. 1 23 45 
21.1 don’t understand how the system determines if a data recipient 1 2 3 45 
is trustworthy. 

22.1 feel safer releasing my personal data when the system states it’s 123 45 
ok. 
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Data Release 

23. I know what personal information I’m releasing. 12345 
24. I find it easy to decide how much or how little data to release in 1 23 45 
a given transaction. 

25. I get help from the system to understand who will receive my 1 23 45 
data. 

History 

26. I can easily find out who has received my personal data with this | 2 3 45 
system. 

27.1 get a good view of who knows what about me from this system. 1 2 3 45 
28. I can easily see how much I’ve used a particular username with | 2 3 45 
this system. 
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Chapter 11 
HCI for PrimeLife Prototypes 


Cornelia Graf, Peter Wolkerstorfer, Christina Hochleitner, Erik Wastlund, and 
Manfred Tscheligi 


Abstract User-centered design (UCD) processes need to be further extended to the 
field of privacy enhancing technologies (PETs). The goal of the UCD process for 
PETs is to provide a means for users to empower them to manage their privacy on the 
Web. Taking care of privacy and being careful while surfing the Web are still consid- 
ered to be cumbersome and time-consuming activities. Hence, PrimeLife aspires to 
provide easy to use tools for users to manage their privacy. This chapter describes the 
challenges in UCD that arose during the development of the PrimeLife prototypes. 
As part of the HCI activities in the PrimeLife project, we have researched the users’ 
attitudes towards privacy and discovered the main challenges when developing user- 
friendly PETs. We use two example prototypes to explain how the challenges can 
be tackled in practice. In general, PETs should neither require much of the user’s 
attention and time, nor should they require particular technical knowledge. They 
should, in fact, present the complex methods of privacy enhancing technologies in 
an easy, understandable and usable way. We will conclude this chapter with a dis- 
cussion of our findings and implications for further development of user-centered 
privacy enhancing technologies. 


11.1 Introduction 


One of the main goals of the HCI activities within the PrimeLife project was the 
design, development and evaluation of usable and understandable privacy enhanc- 
ing technology (PET) prototypes. This also included the extension of user-centered 
design (UCD) processes to be able to advance existing methods in order to be ap- 
plicable to the particular needs and challenges of PET prototypes. The PrimeLife 
prototypes were also developed to be able to answer research questions by con- 
ducting user evaluations. As a part of this process, researchers encountered several 
challenges to be solved in order to accomplish the above stated goal. In the present 
chapter, these challenges will be identified in Section 11.2 and a brief outline of 
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them will be provided. The main goal of this chapter is not an in-depth descrip- 
tion of the challenges, but to provide an overview of possible issues and solutions 
when developing PETs. Section 11.3 will present examples on how these challenges 
were tackled as part of the PrimeLife project. Different approaches to apply the sug- 
gested solutions to the challenges within a UCD process will be described in Sec- 
tion 11.4. The findings presented in this chapter will be discussed in Section 11.5. 
Furthermore, conclusions will be drawn and an outlook on further research will be 
provided. 


11.2 Overview of HCI challenges 


In this section we will outline the HCI and UCD challenges we identified while 
working on the design and evaluation of various PrimeLife prototypes. 


11.2.1 Challenge 1: Limited User Knowledge of PETs 


When designing and developing standard software, developers usually rely on 
knowledge in the form of research and products that already exist. Another pos- 
sible source of information is the user’s mind [Nor88], i.e., the user’s knowledge 
about PETs and the application of these technologies. 

Knowledge about privacy enhancing technologies in the mind of the users is 
still fairly limited. Hence, relying on this knowledge when design decisions are 
made in the area of PETs is likely to lead to unusable results. Several evaluations 
conducted throughout the duration of the PrimeLife project have indicated that the 
users’ knowledge of privacy on the Web is rising. We see this as a consequence of 
more public occurrences and lively discussions about privacy in mass media, espe- 
cially in connection to data disclosure and social networks [GA05]. Our experience 
gained during the PrimeLife project shows that an increasing number of users are 
interested in privacy and in active privacy protection. 

Unfortunately most users still think that privacy protection is very time-consuming, 
too complicated or cannot be achieved, as it is in the hand of the service providers. 
Through several user evaluations within the last three years [KWW08, KWGT09, 
GWKT10], we have observed an increase in user awareness for privacy enhancing 
technologies and privacy issues in general. This might also be caused by added me- 
dia coverage and attention to social networks or larger cases of industrial data loss. 
Concerning the users, this means that the knowledge in the mind of the users is 
increasing and based on our research it is foreseeable that general applicable guide- 
lines for the best way of designing PETs will evolve over the next years. 
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11.2.2 Challenge 2: Technologically Driven Development of PETs 


Currently PETs are mainly developed from a technological viewpoint [WT99, 
SBHKO6]. It seems to lie in the nature of new and rising domains in software devel- 
opment that they often emerge from a rather technological viewpoint. Thus, they are 
targeted for technologically-minded expert users that are skilled in the use of com- 
plex interfaces, offering a vast number of interaction possibilities and options. When 
a domain is established, users as well as developers care about usability and user ex- 
perience (the user’s perception of the system)—unfortunately this is not enough. HCI 
engineers know that users should be included as early as possible in a project — a fact 
that is not widely spread in the creation of software domains [Iiv04]. As a result, un- 
usable software is created within these novel areas. For understanding and accessing 
this fast growing selection for PETs and the background of PETs, technical knowl- 
edge from the users’ side is required. This renders a lot of PETs unusable for most 
users. Since PETs are part of the aforementioned novel software domains, the evo- 
lution from technologically driven development towards user-centered development 
poses a major challenge. 


11.2.3 Challenge 3: Understanding PET Related Terms 


In order to use software successfully, users have to understand the meaning of text 
and labels that appear in the user interfaces. Research throughout the first of three 
project years of the PrimeLife project has indicated that users have problems un- 
derstanding privacy-related terms and privacy policies. Several other studies also 
showed that the language and format of privacy policies are hardly understandable 
for most users [CGA06, FBL04, Rod03, SSM10]. There are different approaches 
to overcome the problem of policy understanding. Kelley presented a way to dis- 
play websites’ privacy policies in a more user friendly way [KBCRO9]. Our studies 
confirm these findings for complex terms (e.g., “privacy preference”) but we have 
also seen that basic terms, such as “privacy policy,” are commonly understood. The 
study was conducted as an online evaluation where 73 volunteers participated. The 
participants defined their understanding of terms and rated the level of understand- 
ing. In a second step, the accuracy of these definitions was rated by security and 
privacy experts. Our results showed that users are able to understand most privacy 
terms, even privacy terms out of context were understandable for most users. 


11.2.4 Challenge 4: Wrong Mental Models of PETs 


Mental models describe how users expect things to work. Hence, they are the basis 
for interaction decisions. Various evaluations of PrimeLife prototypes demonstrated 
that users on the one hand have an incorrect mental model about the meaning and 
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importance of privacy. On the other hand, incorrect mental models of the Internet 
itself hinder the development of working mental models about private data on the 
Web. Mental model research is a very important method extension to research on the 
users’ ideas, behaviour and assumptions about PETs. To increase our knowledge 
in this area, we conducted a mental model study with 17 participants, where we 
investigated users’ ideas on how data handling and data storage on the Web may 
work. Our results showed that the users’ understanding on this topic is incorrect and 
that there is much need for explaining how data handling and data storage works 
in reality. Figure 11.1 shows an example of the users’ understanding of how data 
travels through the Internet. In this case, the user believes that the information he is 
accessing is transmitted via Microsoft from the original source. 


Microsoft ———————— “Se My Computer 
fees 


Fig. 11.1: Participants’ assumption on how data travels through the Web. 


11.2.5 Challenge 5: Privacy as a Secondary Task 


Managing privacy is not a primary task for users. Buying a book online is a primary 
task, while managing privacy while doing so is a secondary task. Users focus their 
attention onto the primary task; everything that is secondary does not get similar 
cognitive resources. Privacy protection is a support action for users when dealing 
with their primary task on the web. Consequently, as it often gets in the way of the 
users, they tend to ignore PETs in general [SFO5]. In the PrimeLife project, similar 
to the security domain (where no user sits down in the evening with the idea to 
manage firewall settings as a free-time activity), we have seen that the amount of 
resources put into privacy management must be in due proportion to the task to 
be completed. Participants in PrimeLife evaluations often stated that installing and 
personalising a tool for privacy protection should not take longer than five to ten 
minutes. 
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11.2.6 Challenge 6: Complex Mechanisms are Hard to Understand 


PETs are not easy to understand because they are based on complex models 
and therefore on complex background mechanisms, such as public-key encryption 
[WT99] or anonymous credentials. These mechanisms are difficult to communicate 
and explain to users due to a lack of technical knowledge and experience. Aside 
from manuals (which are known not to be read by users [NW06]), the user inter- 
face is the only possibility to communicate these concepts to the user. When talk- 
ing about complexity and private data on the Web, we have to take into account 
that multiple sources of complexity are involved. The PET-related concepts (e.g., 
anonymous credentials) also increase the complexity experienced by the user. 


11.3 Tackling the Challenges 


Together with recent discussions about privacy issues in mass media [GAO5], users 
have become more aware of the concept of privacy and PETs in general. Neverthe- 
less, privacy can still not be considered to be of particular importance to the wider 
public. Therefore, awareness training by projects such as PrimeLife, but also by pub- 
lic bodies is of particular importance in counteracting the current status. To empower 
the users to take care of their privacy, the challenges outlined in Section 11.2 need 
to be addressed. The following sections provide an overview of how the challenges 
were answered as part of the PrimeLife project. The approaches in the following are 
a general description of methods, the applied methods are described in the form of 
two example prototypes in Section 11.4. 


11.3.1 Limited User Knowledge of PETs 


In terms of interface design, we have discovered that the design aids the user in un- 
derstanding PETs and privacy concepts, when they are offered with clear interfaces 
and structures that display privacy aspects and possible threats in an understandable 
way, as described in the Pattern collection developed for the PrimeLife interfaces 
[PrilOa]. Furthermore, the use of interfaces, not only by expert users, but also by 
less experienced persons, can be facilitated by providing two different views in the 
interface: a standard view for novice and average users that provides basic settings 
and does not need much configuration from the users’ side, and an expert view that 
is especially designed for technologically minded persons, who set up and configure 
their own preferences. This principle was also applied in the creation of the backup 
prototype described in Section 11.4.1. Here, separate views for novice users and 
experts facilitate the understanding of privacy concepts and allow for more intuitive 
interaction and fine-grained control. 
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11.3.2 Technologically Driven Development of PETs 


Based on our research results within the PrimeLife project, we believe that increas- 
ing the users’ knowledge about privacy also eases their understanding of privacy 
concepts. Results of our research on user understanding and mental models of PETs 
can be found in [GKW* 10, GWKT10, PWG1O]. Privacy concepts can be easily 
conveyed through the user interface. Again, clear structures and visual aids, such 
as icons, can assist in understanding the technology. The interfaces have to be self- 
explanatory and usable by novice users through the employment of already known 
concepts (e.g., nutrition labels as privacy indicators [KBCRO09]). Furthermore, pri- 
vacy patterns [Pril0a, GWGT10] developed as a part of the PrimeLife project pro- 
vide a basis for the creation of user-friendly PET interfaces. 

As mentioned in Section 11.3.1, it is possible to create different approaches to 
the technology for different user groups. Thus, novice and less experienced users 
would need a rather simple interface requiring less technological knowledge that is 
not overwhelming with too much information and too many possibilities. In contrast 
to novice users, more experienced persons would need an expert view to access the 
advanced interaction possibilities to have full control of privacy settings. 


11.3.3 Understanding of PET Related Terms 


Besides increasing the users’ knowledge of privacy and PET in general (cf. Sec- 
tion 11.3.1), it is utterly important that the employed wording is also understood by 
the users. Throughout several evaluations of the users’ perception of privacy, secu- 
rity and PETs, as well as their understanding of connected processes, have been in- 
vestigated [GKW* 10, GWKT10]. This research has underlined the need for under- 
standable wording and further explanation of unknown words. Terms should neither 
require a university degree in law nor in the field of security and privacy. [KBCRO09] 
presented a way to display website privacy policies in a more user friendly way by 
creating information design that improved the comprehensibility and visual presen- 
tation of privacy policies. This was done by drawing from nutrition, warning, and 
energy labelling and considered to demonstrate that, compared to existing privacy 
policies, the new proposed privacy label allowed participants to find information 
more accurately and quickly, as well as providing a more enjoyable experience. A 
similar approach was used to create understandable and intuitive icons for use within 
the PrimeLife project [HHN10]. 

Research throughout the PrimeLife project has indicated that a quick evaluation 
of terms with only a few (non-expert) users can lead to indications on which words 
should be avoided in interfaces [KWGT09]. As part of the PrimeLife HCI activi- 
ties, we have investigated several privacy terms that were also used in the created 
prototypes [GWKT 10]. 

We identified the following five terms as being easiest to understand: 
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Privacy protection 
Required data 
Digital traces 
Identity management 
Full privacy policy 


The terms rated as being very hard to understand are the following: 


Anonymous credentials 
Privacy preference 
Linkability 

Privacy enhancing 


11.3.4 Wrong Mental Models of PETs 


A strong, user-centered design process can facilitate research on users’ mental 
models as well as counter-activities to correct false assumptions. As part of the 
HCI Activities within PrimeLife, research on users’ mental models was conducted 
[GKW* 10] and applied to the developed prototypes. Through a well-founded 
requirement-gathering process, it is possible to form a picture of the users’ under- 
standing on how privacy and PETs work. Although being important in any user- 
centered process, the gathering of the users’ mental models becomes even more im- 
portant in novice domains such as PETs. It has to be ensured that the mental models 
are respected in the user interfaces. Furthermore, it is important to periodically re- 
view and update the researched mental models. This is necessary since privacy, as 
part of the ICT domain, is prone to fast development and rapid changes. 

As part of the user-centered design approach, users from the target group are 
involved in the design process and therefore misunderstandings and false assump- 
tions are discovered and can be clarified. It helps to provide mechanisms that assist 
users in understanding how the developed tool works. Furthermore, practical exam- 
ples assist in correcting false assumptions. For the creation of all prototypes within 
PrimeLife it was taken care to consider user input as well as user feedback in various 
stages of the development process. An exemplary focus on users was particularly 
adhered to for the development of the backup prototype [Pril0b] (Section 11.4.1). 


11.3.5 Privacy as a Secondary Task 


Since privacy activities are considered to be cumbersome, the most efficient way to 
solve this challenge is by creating interfaces and tools that do not need much time 
and effort from the user side (see privacy patterns [Pril0a]). The PETs should work 
in the background: users should be able to access them easily, they should provide 
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support only when necessary and not disturb the users in their workflow with un- 
necessary pop-ups. The ease of use of interfaces in the complex setting of PETs 
was a central point in the PrimeLife project and was also applied to the developed 
prototypes [Pril0Ob]. 


11.3.6 Complex Mechanisms are Hard to Understand 


Even for advanced users, privacy is a very complex concept. Furthermore, the em- 
ployed interfaces usually try to convey very complex mechanisms in the background 
[KWW08]. Most users are overwhelmed with the kinds and amounts of information 
provided. Thus, it is important to get a clear picture of the expectations and knowl- 
edge of the users and to adapt the interfaces to this knowledge [GWGT10]. The 
information presented to the user should not be complex, but should contain clearly 
structured information. In order to reduce complexity, only important information 
should be presented using self-explanatory visual concepts such as icons [HHN10]. 
Any software can present great approaches and ideas, but as long as users do not un- 
derstand it, they will not accept and use it. Therefore strong visual concepts should 
be employed; supportive visualisation techniques such as icons present PETs in a 
more understandable way. This knowledge is also supported by our findings in the 
PrimeLife project (e.g., research on privacy icons [Pri09c]). 


11.4 HCI Activities and Software Development 


In Section 11.3, we described how we have addressed the challenges within the 
PrimeLife project on an abstract level. Given the nature of the different prototypes 
and the different HCI approaches, not all challenges could be addressed to the same 
extent for each prototype. In practice, there are different ways of including HCI 
work in software development processes. In general, HCI engineers have to adopt 
their methods and fine-tune them to fit different development styles such as extreme 
programming [WTS*08]. 

This section focuses on the process perspective and shows the practical integra- 
tion of the above introduced challenges into the PET software development pro- 
cesses in PrimeLife. 


11.4.1 Backup Prototype 


The main purpose of the backup prototype is to give the user the possibility to create 
backups of data and delegate access rights. For example, in case of illness the user is 
able to delegate access rights to other persons, which is an important privacy issue. 
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The development of the backup prototype as part of the PrimeLife project can 
be considered as a best case approach to solve the challenges introduced above. 
It was user-centered and HCI driven from the very beginning. In fact, the entire 
specification of the prototype was based on user research (e.g., requirements) and 
completed by HCI engineers. 

For example, Challenge 1: limited end user knowledge of PETs was addressed 
through the development of different views in the interface (standard view and ex- 
pert view) based on gathered user requirements. The standard view provides all 
necessary elements for creating backups and delegating access rights. The expert 
view should provide in-depth settings for backups and delegation, mainly used by 
expert users. 

When the user interface design of this prototype started, no technical specifica- 
tions were available. Therefore the whole UI was driven by a strong focus on user 
needs. Furthermore, we focused on human-computer-interaction paradigms and us- 
ability guidelines; possible technical matters were ignored during the first part of 
the UI design. In the second iteration, we adapted the developed prototype to the 
technical specifications without changing the interaction paradigms. Thus, the en- 
tire UI assists the user in understanding how the system works and does not need 
any special knowledge concerning PETs and privacy from the user, answering di- 
rectly to Challenge 2: technologically driven development of PETs. Additionally, 
we were focussing on the usage of understandable terms, mainly employing com- 
mon terms and explaining novice and unknown terms (Challenge 3: understanding 
of PET related terms). 

To reduce the cognitive load of users when dealing with this application, we 
concentrated on the usage of elements, in addition to the terms, which are familiar 
to users. We also made the workflow of the tool transparent for users by showing 
them consequences of providing unnecessary or technical informations. Therefore 
we hoped to correct wrong mental models by answering to Challenge 4: wrong 
mental models of PETs. 

As mentioned before, the user interface of the backup prototype is based on the 
needs of the users and not on technical requirements. The design of the prototype 
presents the workflow and interaction possibilities as easy as possible although the 
functionality behind the algorithms is very complex (Challenge 6: Complex mech- 
anisms are hard to understand). 

Thus, user involvement was assured and several of the above introduced chal- 
lenges, such as problems in understanding privacy and PET related terms as well 
as mental models and complex structures could be addressed and counteracted with 
the above introduced methods at a very early stage of development. 


11.4.2 Privacy Dashboard 


The privacy dashboard indicates the security of websites and allows for fine-grained 
handling of privacy information. In this approach, the HCI engineer did not design 


230 C. Graf, P. Wolkerstorfer, C. Hochleitner, E. Wastlund, M. Tscheligi 


mock-ups but supported implementation by directly working on the source code of 
the user interface for the privacy dashboard! 

In the development of the privacy dashboard, the limited user knowledge of PETs 
(Challenge 1) was counteracted by providing, in addition to the standard view, per- 
sonalisation possibilities for each website and by introducing awareness mecha- 
nisms to the user. To ease understanding of PET related terms (Challenge 3), the use 
of PET particular wording as well as abbreviations was avoided (de-technification 
of terms) and definitions and explanations were provided for PET-related terminol- 
ogy (e.g., “third party cookie”). Additionally, the results of the privacy terms study 
[GWKT10] were used to distinguish easily understandable terms from PET-related 
wording. 

Challenge 2: technologically driven development of PETs was met through per- 
manent heuristic evaluations of the developed prototypes and user evaluations. Thus, 
direct feedback either from experts or from users could be included in the develop- 
ment process, permanently advancing the quality of the prototype. 

For the privacy dashboard, it was particularly important to create an interface 
that is easy to handle and does not require much time to do so, answering Challenge 
5: privacy as a secondary task. Therefore, the main interaction when visiting a new 
website was reduced to two clicks, where the user can adjust his privacy settings. 
The settings will then be saved for further visits to the website. Furthermore, the 
risks of a website are presented in an understandable way (even if complex mech- 
anisms are described). Therefore Challenge 6: complex mechanisms are hard to 
understand was also addressed by the privacy dashboard. 

The design approach described here was experienced to be very effective. The 
precondition to enable an efficient development workflow also applied to the privacy 
dashboard is that relevant tools and workflows are decided upon and put in place 
(e.g., SVN for managing the source files to ensure that re-engineered Ul-code will 
be part of the iterations). This also requires the HCI engineer to have knowledge of 
the used programming and markup languages, which is not always the case. 


11.4.3 Examples Reflected 


In the above sections, we showed different approaches on how to answer the chal- 
lenges described in Section 11.2 during the development of two PrimeLife proto- 
types. In practice, the meeting of challenges depends on a number of criteria, such 
as the collaboration between developers and HCI engineers or the fidelity of the 
prototype when HCI activities are started. 

The experience of the PrimeLife project showed that early inclusion of HCI 
knowledge is a key success factor that provides answers to many of the above in- 
troduced challenges. For PETs themselves, it is not utterly important which UCD 
process is applied, as long as HCI activities are adopted as early as possible. The 
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kind of HCI evaluations conducted and the feedback provided to development are 
important. 


11.5 Discussion and Outlook 


In this chapter, we demonstrated that current PETs lack usability and when devel- 
oping PETs one will encounter several of the above-mentioned challenges, since 
the entire field of PETs is currently still in an early stage of development. This 
goes hand in hand with knowledge deficits of users: after multiple iterations of user 
evaluations, we can conclude that the lack of privacy knowledge calls for a focus on 
mental models and users’ understanding of terms and background mechanics, which 
is the focus of our adoptions to the UCD process. Based on our experience during 
the PrimeLife project, we presented several approaches and practical examples on 
how common UCD processes can be enhanced and adapted to PETs. 

As the field of ICT and PETs in particular is developing very fast, we expect the 
suggested processes to be used and challenges to be adopted quickly within the near 
future. This will also foster the adoption of PETs by a wider public and hence lead 
to “privacy by design.” 

As an outlook on possible future methods for evaluations of PETs, we pro- 
vide a review of user involvement methods defined in ISO/TR 16982:2002(E) 
“Ergonomics of human-system interaction—Usability methods supporting human- 
centered design:” 


e Observation of users: Systematic collection of observation material is done in 
usability laboratory evaluations. Test leaders are used to observe not only what 
users are doing but also how they react emotionally. For PET evaluation, we rec- 
ommend having an experienced test leader observe. This is because non-verbal 
cuesare much more important in discovering a user’s attitudes towards PETs than 
towards default software products such as websites. 

e Performance related measurements: It is important to implement special PET 
performance measures. The “classical” performance measures (task completion 
time, error rate) should be extended with data disclosure measures. In combina- 
tion with questionnaires and/or interviews, such measurements will provide the 
needed knowledge on how to communicate complex privacy issues. 

e Critical incidents analysis: Critical incidents in the PET domain, as we ob- 
served during the PrimeLife project, are a welcome extension to the error rate 
measurement. 

e Questionnaires/interviews: When answering questions in the HCI/PET domain, 
users will not be able to express how they will behave when using the software. 
Questionnaires/interviews are worthy when measurements and/or observations 
provide data to base the analysis on. For example, the users can be asked with 
a questionnaire or during an interview which private data they think they have 
disclosed to others. The technical measurement (a log file analysis in this case) 
will tell if the PET succeeded in supporting the user to preserve privacy or not. 
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e Thinking aloud: This method, where users continuously verbalise their ideas, 
beliefs, expectations, doubts, discoveries, etc. during the evaluation, is well- 
suited to give insight into the mental models of the users. We recommend in- 
cluding the thinking aloud methodology as a default to every PET design. 

e Collaborative design and evaluation/creativity methods: Due to the fact that 
knowledge about PETs is neither much distributed on a wide scale nor broadly 
available, collaborative design and creativity methods are not very helpful from 
our experience. But we assume that collaborative design with a focus on the me- 
diation between user and developer seems to be a promising approach. Therefore, 
further research is needed on this issue. 


Chapter 12 


The Users’ Mental Models’ Effect on their 
Comprehension of Anonymous Credentials 


Erik Wastlund and Simone Fischer-Hiibner 


Abstract Anonymous Credentials are a key technology for enforcing data minimi- 
sation for online applications. The design of easily understandable user interfaces 
for the use of anonymous credentials is however a major challenge, as end users 
are not yet familiar with this rather new and complex technology and no obvious 
real-world analogies exist for them. In this chapter, we analyse what effects the 
users’ mental models have on their understanding of the data minimization property 
of anonymous credentials in the context of an e-Shopping application scenario. In 
particular, we have investigated the effects of the mental models of a card-based 
user interface approach and an attribute-based user interface approach and com- 
pared these in terms of errors of omission and addition. The results show that the 
card-based approach leads to significantly more errors of addition (i.e., users be- 
lieve that they have disclosed more information than they actually have) whereas 
the attribute-based approach leads to more errors of omission (1.e., users underesti- 
mate the amount of data that they have disclosed). 


12.1 Introduction 


A fundamental privacy design principle is data minimisation , meaning that services 
or applications should be designed in accordance with the aim of collecting and 
processing as little personal data as possible. Data minimisation limits the commu- 
nication partner’s ability to profile users and is as a legal principle well acknowl- 
edged by most Western privacy laws. It can in particular be derived from Art. 6 
I (c), 6 I (e) of the EU Data Protection Directive 95/46/EC [Dir95] and is for in- 
stance also required explicitly by Section 3a of the German Federal Data protec- 
tion Act [Ger09]. Anonymous credentials are a key technology for achieving data 
minimisation on an application level. The aim of this chapter is to show our work 
with creating UIs (user interfaces) with the objective to make users comprehend 
the data minimization property of anonymous credentials. Anonymous credentials 
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are becoming increasingly significant not only in research but also in practice: The 
identity mixer (Idemix) anonymous credential protocol [CLO1] used in PrimeLife is 
contributed by IBM Open Source and Microsoft has been incorporating the U-Prove 
anonymous credential technology into Windows Communication Foundation and 
Windows CardSpace. Mental models can be understood as thought models of how 
a system works. The mental model of a system is often divided into three parts; the 
view of the programmers, the view of the user, and the UI design [Coo95] [Nor88]. 
In this chapter, we will describe the effects of inducing different mental models on 
UIs that support data minimisation as well as various comprehension problems that 
users have due to their mental models of the card-based metaphor and of privacy- 
enhanced e-Shopping transactions in general. 


12.1.1 Anonymous Credentials 


A traditional credential (often also called certificate or attribute certificate) is a set 
of personal attributes, such as date of birth, name or personal number, signed (and 
thereby certified) by the certifying party and bound to its owner by cryptographic 
means (e.g., by requiring the owner’s secret key to use the credential). In terms of 
privacy, the use of (traditional or anonymous) credentials is better than a direct re- 
quest to a certifying party, as this prevents the certifying party from profiling the user 
[CSS*05]. Traditional credentials require, however, that all attributes are disclosed 
together if the owner wants to prove certain properties. This makes different uses 
of the same credential linkable to each other. Anonymous credentials (also called 
private certificates) were first introduced by Chaum [Cha85] and later enhanced 
by Brands [Bra99] and by Camenisch and Lysyanskaya and their Idemix protocol 
[CLO1] and have stronger privacy properties than traditional credentials. Anony- 
mous credentials, in contrast to traditional credentials, allow a user to selectively 
reveal only a subset of her attributes or to prove that she has a credential with spe- 
cific properties without revealing the credential itself or any additional information. 
For example, if a user has a governmentally issued anonymous passport credential 
with personal attributes typically stored in a passport including her date of birth and 
she wants to purchase a video online which is only permitted for adults, she can 
prove with her credential via a cryptographic zero-knowledge proof just the fact 
that she is older than 18 without revealing her date of birth or any other attributes 
of her credential. In another scenario, the holder can use her anonymous passport 
credential for proving her gender or name. In other words, anonymous credentials 
allow the selective disclosure of identity information encoded into the credential. In 
addition, the Idemix anonymous credential system also has the property that multi- 
ple uses of the same credential cannot be linked to each other, which can prevent the 
linking and profiling of different user sessions. If, for instance, the user later wants 
to buy another video which is only permitted for adults at the same online video 
store, she can use the same anonymous credential as proof that she is over 18 with- 
out the video store being able to recognise that the two proofs are based on the same 
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credential. This means that the two rental transactions cannot be linked to the same 
person. Hence, whereas traditional electronic credentials have very similar proper- 
ties as real-world credentials, anonymous have further data minimisation charac- 
teristics that differ from the ones of real-world credentials. A special challenge for 
HCI (Human Computer Interaction) representations of anonymous credentials is to 
illustrate the following two characteristics: 


e Selective Disclosure: Proving only some of the attributes of an anonymous cre- 
dential. 
e Unlinkability: Multiple uses of the same anonymous credentials cannot be linked 


The main focus of our empirical Usability studies, which we present in this paper, 
has so far been on the comprehension of the selective data disclosure property. 


12.1.2 Related Work 


Within the scope of the PRIME! project, our usability tests of PRIME prototypes 
revealed that users often did not trust privacy-enhancing technologies and their data 
minimisation properties, as the possibility to use Internet services anonymously did 
not fit to their mental model of Internet technology [PFHD*05] [ACC*ce]. Ca- 
menisch et al. [CSSZ06] discuss contextual, browser-integrated user interfaces for 
using anonymous credential systems. In user tests of anonymous credential selection 
mockups developed within the PRIME project, test subjects were asked to explain 
what information was actually given to a web site that demanded some proof of age 
when a passport was used to produce that proof (more precisely, the phrase “Proof 
of “age >18” [built on “Swedish Passport” J” was used as a menu selection choice 
in the mockup). The test results showed that the test users assumed that all data nor- 
mally visible in the physical item referred to (i.e., a passport) was also disclosed to 
the web site [WP008]. Hence, previous HCI studies in the PRIME project showed 
already that designing user interfaces supporting the comprehension of anonymous 
credentials and their selective disclosure property is a challenging task. 

More than 10 years ago, Whitten and Tygar [WT99] discussed the related prob- 
lem that the standard model of user interface design is not sufficient to make com- 
puter security usable to people who are not already knowledgeable in that area. They 
conclude that a valid conceptual model of security has to be established and must 
be quickly and effectively communicated to the user. 


' EU FP6 integrated project PRIME (Privacy and Identity Management for Europe), 


https://www.prime-project.eu/ 
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12.2 Performed User Tests 


The issue of anonymous credential selection paradigms and mental models is tightly 
knit to users’ current understanding and usage of ordinary plastic credit- and identity 
cards. On good grounds, proponents of the plastic card paradigm argue that this is 
easily understood by users as it mimics what they already know and use every day. 
Thus, explaining the basic idea of digital credentials is easily done by comparing 
them to their plastic predecessors. The drawback of this metaphor is that plastic 
cards do not lend themselves to selective disclosure. An alternative to the card-based 
metaphor is the attribute-based approach were no identity attributes are referenced 
as specific entities other than belonging to a specific group i.e., card. In order to 
investigate which of the two paradigms makes it easier for users to understand the 
principle of data-minimisation, two lines of variations of UIs were proposed. The 
UIs differed in terms of selection mechanism. The selection was either done by 
clicking on a representation of the full card containing the desired attribute or by 
choosing the attribute from a dropdown list containing the possible sources of the 
attribute. 


12.2.1 Method 


All the user tests presented in this study have been conducted in the same fashion. 
The users have been introduced to the principle of selective disclosure, the system 
they are about to work with, and the eShopping task they are about to perform. After 
having performed their tasks, users were asked to describe what personal informa- 
tion they had conveyed to the data recipient. Our main interest with that question 
was to analyse if they understood the principle of selective disclosure or if the users 
would believe that they had sent additional information. 


12.2.1.1 Participants and Test Design 


The participants of the user tests were students enrolled at Karlstad University (with 
the exception of a few employees), aged between 18 and 65, with the vast majority 
being under 35. The participants were volunteers recruited around the campus and 
as participation was not part of any specific course, the participants studied a variety 
of subjects. The tested UIs have been developed through a cyclic process of UI 
prototyping, usability testing, and subsequent refinements and redesigns of the user 
interfaces. Most of the user tests are small sample studies and include only five 
participants. Nielsen [Nie00] points out that while experiments showed that a test 
with at least 15 users is needed to discover all the usability problems in the design, it 
is better to distribute the budget for user testing across several iterations of mockup 
developments/improvements and tests, e.g., it will be better to spend this budget 
on three tests with 5 users each. After the first study with 5 users, usually 85% of 
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the usability problems will be found, and it is recommended to fix these problems 
in a redesign before testing again. For this reason, we have also decided to do a 
series of iterations of mockups redesigns for our policy mockups and tested the 
early iterations with only 5 test persons in each iteration round. Having said this, 
it is important to point out that the statistical analysis is done on a combined level, 
contrasting all UI’s in the respective approach with 35 users testing card-based UIs 
and 48 users testing the attribute-based UIs. 

Independent of the credential selection paradigm, the general setting of the tests 
was identical. In order to create a realistic experience for the test participants, we 
created an interactive Firefox background for the user tests. The background con- 
sisted of an image of the Firefox web browser, which in turn contained a scrollable 
image of the buy kindle books section of the Amazon.com web page. The Ama- 
zon.com image was equipped with clickable links that would activate the pop up of 
the select credentials console and, thus, test participants would get the familiar look 
and feel of the web site as it looks today up to the point of paying for the purchase. 
After the test participants had been introduced to the basic principles of selective 
disclosure, they were asked to purchase a book of their own choice with the cre- 
dentials that were available to them at the time. After the purchase they were asked 
what data they had given to Amazon.com during the transaction. 

The users were all given the same scenario. They were to act as Inga Vainstein 
who had the anonymous credential system installed on her machine and use her cre- 
dentials to buy a book from Amazon.com. Inga had already imported anonymous 
credentials from the Swedish passport authority and the Swedish road authority and 
had also entered information about her Visa and Amex credit cards. The participants 
were told that they were to test a new and more secure payment system where they 
had to prove their right to use a credit card by proving that they had the same name 
as the credit card holder by using one of the two anonymous credentials. Indepen- 
dent of the paradigm, the correct response was that the only information they had 
conveyed to the data recipient was their name and the issuer. However, from the 
information about the issuer, further meta information can be derived, e.g., Inga is 
a Swedish citizen or Inga has a Swedish driver license. Not reporting the issuer (or 
any other information that was sent in the scenario) has been recorded as an error of 
omission, while reporting that more has been revealed has been recorded as an error 
of addition. Apart from the differences in UIs, there was also a difference in how 
the users gave their responses. In the card-based scenarios, the users were shown 
images of the source cards together with a table containing all the information on 
the cards and they were instructed to tick the information being conveyed. As no 
reference was made to cards in the attribute-based scenario, instead of showing the 
source cards, the participants simply got to write their response on a piece of paper. 
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12.2.2 The Card-Based Approach 


In order to investigate how an effective selection mechanism for a card-based iden- 
tity management system should be designed, we tested a number of different al- 
ternatives in different iterations. The reason for the many iterations” was the simple 
fact that very few of the participants understood the principle of selective disclosure. 

The selection mechanism of our first credential selection UI showed the full cards 
and once the participants had chosen both a credit card and identity card it was pos- 
sible to continue to the summary page which contained only the selected informa- 
tion. As none of the participants understood the concept of data minimisation, we 
explored a number of alternative UIs, all with the central idea to show what infor- 
mation was selected from the different source cards. Additionally, during some of 
the tests rounds, we instructed the users to create a new temporary (virtual) card 
instead of selecting cards in order to move them out of their current understating of 
credentials. 

Our first alternative UI was based on the idea of highlighting and explicitly show- 
ing that some information is selected and that some is not. In order to do so, we 
added a mouse over state where the information about to be selected was shown 
with little scissors around cut outs and the background was blurred out (See Fig- 
ure 12.1 for an example). 


PRIME - Send Personal Data 


view) | Condensed Privacy Notice \ Full Privacy Notice \\ Claim Request ‘\ 
1 \ \ 


& Payment 


Summary Select Payment Information! 


a 


Credit Card 
Select Credit Card 
Name 


Fig. 12.1: The Cut out and blur UI with the Visa card in its mouse over state. 


In the next iteration, we created a short animation where the cutout, when clicked, 
moved from the source card to the virtual card while the source card dissolved. This 
was done based on the idea that the error was due to users perceiving the grayed out 


? For a description of the reasoning behind the iterations see [Pri09a, Chapter 5] 
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information as being sent but not being important. The animation however, did not 
help the users to understand the selective disclosure as our tests revealed. 

Despite our efforts to highlight that specific data was being selected from the 
source cards, users still overestimated the amount of data being sent. One possible 
explanation was that users, having seen the data, associated it with the cards and 
thus thought it was being sent, we decided to completely hide all data that were not 
sent and to show only relevant data that was about to be selected (See Figure 12.2 
for an example). 


@ file:// - firefox2.swf (application/x-shockwave-flash Object) - Mozilla Firefox (oles 


Create PrimeLife Virtual Card 4 


Select Proof 


Passport 
Country: Sweden . J 


Name: Inga Vainstein 


a se te tot 


| __ Select Proof 


~ Driver's license 
Issuer: Vagverket n 


4 


Fig. 12.2: UI that shows only relevant data. 


Even though the full cards were not shown, the majority of users thought that all 
of the data they associated with the source cards were sent. Therefore we focused 
on making it very explicit that the data from the source cards was suppressed by 
utilising the familiar concept of blacking out information (See Figure 12.3 for an 
example). 

Still, most users believed they had sent all data from the source card. One possi- 
ble reason for this might be that users believe that the blacked out data is being sent 
in encrypted form. A more practical problem with this design is the fact that creden- 
tials that contain a lot of information (e.g., passports) take up a lot of screen space, 
which makes it difficult to show if the transaction at hand demands information from 
multiple sources. 
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firefox2.swf (application/x-shockwave-flash-objekt) - Mozilla Firefox 


i iB | file: {}{C:/Documents and Settings/Maria/Desktop/PrimeLife 2009/Test 7 Erik (5 pers)/firefox2 swe = : id 


Send PrimeLife Virtual Card 


Credit Card: 


KORKORT SVERIGE 


Fig. 12.3: The summary page of UI blacking out data that are not sent. 


12.2.2.1 Test Results for the Card-Based Selection Paradigm 


All in all, we performed seven iteration rounds of tests with subsequent UI im- 
provements, where 5 users participated in each test round. Of the 35 users, only two 
understood that the only information they sent from the driver’s license or passport 
was the name (Inga Vainstein) plus information about the card issuer (in form of the 
signature by the Swedish government or Swedish road authority). Further, three test 
users understood the principle of selective disclosure in so far as they understood 
that the name and not all information was sent. However, they missed the fact that 
information about the card issuer was revealed as well. Thus, five users (14%) un- 
derstood, at least up to a point, the principle of selective disclosure with the card- 
based approach. 


12.2.3 The Attribute-Based Approach 


As the majority of errors during testing of the card-based UIs were due to users 
not understanding the principle of selective disclosure, we wanted to know if users 
would perform better or if errors are different if there is no mention of cards in the 
UIs. Instead of informing the users that they had imported credentials in the form of 
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cards, we told the users that they had imported validated attributes of information 
from the Swedish passport authority and the Swedish road authority. 

The first iteration of UIs contained two drop down lists where the user could 
select credit card and verifier of proof. Once the user had selected one of each, it 
became possible to click the “send” button. (See Figure 12.4 for an example). 


firefox2.swf (application/x-shockwave-flash-objekt) - Mozilla Firefox 
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VI 


Proof [Name: Inga Vainstein]: 
Select Verifier... 
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ce) Polisen Ser Cancel 


Fig. 12.4: UI for selecting information and verifier. 


A common user comment regarded the labeling Proof [Name: Inga Vainstein] 
and verifier. This was addressed in the following iteration where the user was instead 
asked to select issuer of the proof. Although these changes led to a substantially 
higher amount of correct responses in terms of selective disclosure, the users instead 
did not understand that the information about the issuers was also being sent. 


12.2.3.1 Test Results for the Attribute-based Selection Paradigm 


All in all, we performed six iterations of tests with an average of 8.5 users in each. 
Of the 48 users, 22 understood the selective disclosure and correctly stated that 
name and issuer was sent. Another 10 users missed the disclosure of the issuer 
but understood that nothing more than their name was sent. In total, 32 users (66%) 
fully or partially understood the principle of selective disclosure. Another interesting 
observation is that amongst the users who used the first attribute-based UI with the 
’select verifier’ instruction, some interpreted the instruction as the data being sent 
via the verifier who would then be able to trace all transactions being made by the 
user. Hence, they got the wrong impression that the verifier (e.g., the police or the 
Swedish road authorities) could in the end trace their activities. 
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12.2.4 Results of the User Studies 


In order to answer the main question of this study, namely, does the user’s mental 
model affect the understanding of selective data disclosure, a binominal test was 
performed between the results of the card-based approach and the attribute-based 
approach. The results of the binominal test showed that there is a significant differ- 
ence (p < 0.001) between the 14% correct response rate in the card-based approach 
(n = 35) and the correct response rate, 66% in the attribute-based approach (n = 48). 
It should be noted that in this analysis both users omitting issuers have been counted 
as correct (for specific proportions see Table 12.1). 


Table 12.1: Proportions (and count) of errors of omission, correct responses and 
errors of addition in the card-based and the attribute-based approach. 


|Omission Correct Addition 
Card 9% (3) 6% (2) 86% (30) 
Attribute]21% (10) 46% (22) 33% (16) 


A noteworthy fact is the difference between the two approaches in terms of errors. 
Whereas there were ten times as many errors of addition than errors of omission in 
the card-based approach, the same factor for the attribute-based approach was 1.6. 
The main source of errors of addition in the card-based approach was the notion 
that all information of the source card is sent. A few users also suggested that now 
Amazon even knows what they look like as well as their handwriting, indicating that 
these users believed an exact copy of the source card was sent. A contrasting men- 
tal model artifact was observed in the attribute-based approach. However, instead 
of the concept of the card, it was the concept of the verifier that led users down 
the wrong path believing that data was sent via the verifier who would then gain 
knowledge of the transaction. Moreover, in the attribute-based approach, the main 
source of errors of addition was personal identification number and address, even 
though these attributes were not part of the eShopping scenario. As the use of per- 
sonal identification numbers is however very common in Sweden, this error is most 
probably a cultural artifact.? Being a “very Swedish” error or not, it still shows that 
the users had the wrong mental model in regards to privacy-enhance e-Shopping 
with anonymous credentials. 


12.3 Conclusions & Future Work 


The main result of this study is that the mental models of users affect their un- 
derstanding of the selective disclosure property of anonymous credentials. Further- 


3 In Sweden, personal identification numbers are extensively used both in contacts with govern- 
ment agencies and private companies in all situations where an individual is to be identified. 


12 Users’ Mental Models’ Effect on their Comprehension of Anonymous Credentials 243 


more, we have shown that the use of an attribute-based approach to the design of a 
credential selection interface leads to fewer misunderstandings in terms of data dis- 
closure than when the UI design uses the card-based approach. It should however 
be stated that even though users made less errors of addition in the attribute-based 
approach, we do not know if this is due to the fact that they actually understood the 
principle of selective disclosure or if they simply looked at the UI and made their 
inferences. The results of the attribute based user tests indicate that it is in fact the 
latter as there were a number of users not understanding that the issuer was revealed 
or that believed that their personal identification number was revealed as well. A rel- 
evant question is of course what the objective of the UI is: should it educate the user 
of the properties of anonymous credentials or is it enough that the users understand 
the end result of using such a system? 

In this study we did not include any aspects of unlinkability. However, intuitively, 
the attribute-based approach seems to be more appropriate for meditating the un- 
linkability property of anonymous credentials than the card-based approach, as in 
the real world the different displays of physical cards are linkable. Further research 
needs to be conducted on this question. 

As the key issue for the deployment of privacy enhancing technology such as 
the use of anonymous credentials is to have users understand the privacy-enhancing 
features, or at least not misunderstand them, the mental model invoked by the UI 
is of great importance. Hence, the card-based approach can only be successfully 
deployed, if we can fill the gap between the card-in-your-wallet approach and the 
functionality of anonymous credentials. 

One possible way to solve this problem is to focus on the main difference be- 
tween the cards and the anonymous credentials, namely that the latter can be adapted 
to fit user’s current needs. Future research being planned within the last year of the 
PrimeLife project will investigate the effects of introducing the credential selection 
mechanism as being part of an adaptable eID system on the user’s understanding of 
data minimisation. This research will also include questions regarding the unlinka- 
bility of credentials. Last but not least, in addition to investigating the users’ mental 
models with regards to their understanding of data minimisation, we will also in- 
vestigate users’ understanding of how their personal data flows between the creden- 
tial verifier and the services side. Taken together, users can only value the privacy 
features of anonymous credentials completely, if they can understand their minimal 
disclosure and unlinkability properties, and if they can understand that personal data 
can flow directly to a services side without the involvement of the credential issuer. 
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Chapter 13 
Trust and Assurance HCI 


Simone Fischer-Htibner, Hans Hedbom, and Erik Wastlund 


Abstract In this chapter, we present our HCI (Human Computer Interaction) work 
for mediating the degree of trustworthiness of services sides to end users and for 
enhancing their trust in PrimeLife-enabled applications. For this, we will present 
the user interface development work of a trust evaluation function and the PrimeLife 
Data Track. 


13.1 Introduction 


Trust plays an important role for PrimeLife, because users do not only need to trust 
their own platforms and user side identity management components to manage their 
data properly. They also need to trust communication partners and their remote set 
of platforms that receive personal data to process their data in a privacy-friendly 
manner and according to (business) agreements. Our previous HCI work within 
the FP6 project PRIME, as well as the research by others, had revealed that end 
users often do not trust the claims of the privacy-enhancing features of privacy- 
enhancing technologies (PETs)[PFHD* 05], [SGB01]. Within PrimeLife, we have 
therefore conducted research on how user interfaces can contribute to communicate 
the trustworthiness of PrimeLife technologies and assurance information to the end 
users. 

In this context, we have first investigated social trust factors and, based on this, 
developed and tested user interface (UI) mockups for a trust evaluation function, 
which uses a multi-layer design for informing users about the evaluation of a ser- 
vices side’s trustworthiness in regard to privacy and business reliability. 

Additionally, we have developed the Data Track, which provides the user with a 
history function documenting what personal data the user has revealed under which 
conditions, and includes online functions for a user to exercise her rights to access/- 
correct/delete her data at the remote data controller’s side (cf. Art. 10 EU Directive 
95/46/EC). As research on social trust factors has shown, trust in online transactions 
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can be increased if the transactions are transparent and reversible (see below). The 
Data Track is a transparency-enhancing tool, which can thus increase the end users’ 
trust and assurance that their personal data are handled properly by others and can, 
with its online functions, enhance the user’s control over her personal spheres. 

In the chapter, we will first in Section 13.2 discuss social trust factors, which have 
motivated our design of the trust evaluation function and the Data Track. Then, we 
will in section 13.3 present the design and tests of the trust evaluation function. 
Results of these first two sections have also been presented earlier in [FHFLO9]. In 
Section 13.4, we will discuss the user interface designs for Data Track that we have 
developed for the PrimeLife project as well as the usability test results from tests 
that we conducted at Karlstad University and at the Centre for Usability Research 
in Austria. Finally, conclusions are drawn at the end of this chapter. 


13.2 Social Trust Factors 


In this section, we investigate suitable parameters corresponding to social trust fac- 
tors for measuring the actual trustworthiness of a communication partner in terms 
of privacy practices and of the reliability as a business partner and for establishing 
reliable trust. 

Social trust factors in the context of e-Commerce have already been researched 
by others. For instance, [TZY01] showed that for ordinary users to feel secure when 
transacting with a website, the following factors play a role: 1. the company’s rep- 
utation, 2. their experiences with the website, and 3. recommendations from inde- 
pendent third parties. 

Riegelsberger et al. [RSMO5] present a trust framework that is based on contex- 
tual properties (based on temporal, social and institutional embeddedness) and the 
services side’s intrinsic properties (ability, motivation based on internalised norms, 
such as privacy policies, and benevolence) that form the basis of trustworthy be- 
haviour. Temporal embeddedness can be signalled by visible investments in the 
business and the side, as e.g. visualised by professional website design, which can 
also be seen a symptom for the vendor’s intrinsic property of competence or ability 
to fulfill a contract. Social embeddedness, i.e. the exchange of information about 
a side’s performance among users, can be addressed by reputation systems. Insti- 
tutional embeddedness refers to the assurance of trustworthiness by institutions, as 
done with trust seal programs. 

A model of social trust factors, which was developed by social science re- 
searchers in the PRIME project [LLPH05], [ACC*ce], has identified 5 layers on 
which trust plays a role in online services: socio-cultural, institutional, service area, 
application, and media. Service area-related trust aspects that concern the trust put 
in a particular branch or sector of economic activity, as well as socio-cultural trust 
aspects, cannot however be directly influenced by system designers. More suitable 
factors for establishing reliable trust can be achieved on the institutional and appli- 
cation layers of the model, which also refer to trust properties (contextual property 
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based on the institutional embeddedness as well as certain intrinsic properties of a 
web application) of the framework by Riegelsberger et al. [RSMO05]. As discussed 
by Leenes et al. [ACC*ce], on the institutional layer, trust in a service provider 
can be established by monitoring and enforcing institutions, such as data protec- 
tion commissioners, consumer organisations and certification bodies. Also, on the 
application layer, trust in an application can be enhanced if procedures are clear, 
transparent and reversible, so that users feel in control. 

This latter finding also corresponds to the results of the British Trustguide project 
[LCP06], which also provides guidelines on how cybertrust can be enhanced and 
also concludes that increased transparency brings increased user confidence. 


13.3 A Trust Evaluation Function 


In this section, we will present a trust evaluation function that has been developed 
within the PrimeLife EU project (see also [Pri09b], [FHFLO9]). This function has 
the purpose of communicating reliable information about trustworthiness and as- 
surance (that the stated privacy functionality is provided) of services sides. For the 
design of this trust evaluation function, we have followed an interdisciplinary ap- 
proach by investigating social factors for establishing reliable trust, technical and 
organisational means, as well as HCI concepts for mediating evaluation results to 
the end users. 


13.3.1 Trust Parameters Used 


Taking results of the studies on social trust factors presented in Section 13.3 as well 
as available technical and organisational means into consideration, we have cho- 
sen the following parameters for evaluating the trustworthiness of communication 
partners that mainly refer to the institutional and application layers of the social 
trust factor model. Information provided by trustworthy independent monitoring 
and enforcing institutions, which we are utilising for our trust evaluation function, 
comprise: 


e Privacy and trust seals certified by data protection commissioners or independent 
certifiers (e.g., the EuroPrise seal, the TRUSTe seal or the ULD Giitesiegel); 

e Blacklists maintained by consumer organisations (such blacklists exist for exam- 
ple in Sweden and Denmark); 

e Privacy and security alert lists, such as list of alerts raised by data protection 
commissioners or Google’s anti-phishing blacklist. 


It is important to note that black lists and alert lists have to be carefully chosen to 
ensure that blacklisting and alert listings are based on fair decisions. For this, it has 
to be checked how reliable the list operators are, who is controlling them and what 
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are the criteria for blacklisting or alert listing. The European Consumer Centres have 
launched a web-based solution, Howard the owl, for checking trust marks and other 
signs of trustworthiness that could be used as well when evaluating a web shop . 

Static seals can be complemented by dynamic (in real-time generated) seals con- 
veying assurance information about the current security state of the services side’s 
system and its implemented privacy and security functions. Such dynamic seals 
can be generated in real-time by an “Assurance Evaluation” component that has 
been implemented within the PRIME framework [Pea06]. Dynamic seals that are 
generated by tamper-resistant hardware can be regarded as third-party endorsed as- 
surances, as the tamper-resistant hardware device can be modeled as a third party 
that is not under full control of the services side. Such dynamic assurance seals 
can measure the intrinsic property of a side’s benevolence to implement privacy- 
enhancing functionality. Such privacy-enhancing functionality can also comprise 
transparency-enhancing tools that allow users to access, and to request to rectify or 
delete their personal data online (cf. Data Track described in Section 13.4), which 
will allow users to “undo” personal data releases and to feel in control. As dis- 
cussed above, this is an important prerequisite for establishing trust. For our trust 
evaluation function, we therefore used dynamic assurance seals informing about the 
PrimeLife privacy-enhancing functions that the services side’s system has imple- 
mented. Also, reputation metrics based on other users’ ratings can influence user 
trust, as discussed above. Reputation systems, such as for instance the one on eBay, 
can, however, often be manipulated by reputation forging or poisoning. Besides, the 
calculated reputation values are often based on subjective ratings by non-experts, 
for whom it might be difficult to judge the privacy-friendliness of communication 
partners. We have therefore not considered reputation metrics for the PrimeLife trust 
evaluation function. 

Following the process of trust and policy negotiation of the PRIME technical 
architectures (on which also PrimeLife systems are based), privacy seals, which are 
digitally signed by the issuing institution, as well as dynamic assurance seals can 
be requested from a services side directly (see steps 4-5 in Figure 13.1), whereas 
information about blacklisting and alerts need to be retrieved from the third party 
list providers (see steps 6-7 in Figure 13.1). After the user requests a service (step 
1), the services side replies with a request of personal data and a proposal of a 
privacy policy (step 2). For evaluating the side’s trustworthiness, the user can then 
in turn request trust and assurance data and evidences from the services side, such 
as privacy seals and dynamic assurance seals (Steps 4-5), and information about 
blacklisting or alerts concerning this side from alert list or blacklist providers (Steps 
6-7). Information about the requested trust parameters are then evaluated at the user 
side and displayed via the trust evaluation user interfaces along with the privacy 
policy information of the services side within the “Send Personal Data?” dialogue 
window (see below), with which also the user’s informed consent for releasing the 
requested data for the stated policy is solicited. The user can then, based on the trust 
evaluation results and policy information, decide on releasing the requested personal 
data items and possibly adopt the proposed policy, which is then sent to the service 
provider (Step 8). 
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As 
eel = 


1.Request of service 


2. Data request; privacy policy proposal 
Provide me with: Name, address, birth date, 
email address, credit card details, 

ersonal preferences on X, ... 


a 
5. We can offer the following: 
¢ A EuroPrise privacy seal issued by ULD 
¢ We are running a PrimeLife-enabled system including data 
minimisation support and Online functtions for exercising 
user rights. We have encrypted data storage ... 
8. Data handling policy* 
Here’s what you have requested: 
Jane Doe, Universitetsgatan 3, 
Karlstad, Sweden, 1968-06-01 
Jane.doe @ mail-provider-xyz.com, 


WISA, 1234 5678 9012), Y, > 


Fig. 13.1: The steps for Privacy and Trust negotiation as proposed by PrimeLife. 


13.3.2 Design Principles and Test Results 


For the design of our trust evaluation function mock-ups, we followed the following 
design principles comprising general HCI principles as well as design principles, 
which should in particular address challenges and usability problems that we have 
encountered in previous usability tests: 


e Use a Multi-layered structure for displaying evaluation results, i.e. trust evalua- 
tion results should be displayed in increasing details on multiple layers in order to 
prevent an information overload for users not interested in the details or the eval- 
uation. Our mockups have been structured into three layers, displaying a short 
status view with the overall evaluations for inclusion in status bars and in the 
“Send Personal Data?” window (1st layer, see Figure 13.2) displaying also the 
services side’s short privacy policy and data request; a compressed view display- 
ing the overall results within the categories privacy seals, privacy & security alert 
lists, support of PRIME functions and blacklisting (2nd layer); and a complete 
view showing the results of sub categories (3rd layer, see Figure 13.3). 

e Use a selection of meaningful overall evaluation results. For example, in our 
mockups, we use a trust meter with a range of three possible overall evaluation 
results that provide a semantic by their names (which should be more meaningful 
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Send Personal Data ? 


Send Personal Data? 


Your data 
Inga Vainstein, Kungsgatan 12, 


Karlstad 
VISA 567 899 000 222-987 


..is requested by 


Movies Inc We Love. Movies. ass Trust Evaluation result for this site: 


www.welovemovies.com 


Purposes 
Link to full privac lic’ 


Betalning och leverans av bestalld film 


Fig. 13.2: “Send Personal Data?” window displaying the overall trust evaluation 
result (1st Layer). 


than for instance percentages as used by some reputation metrics). The three 
overall results that we are using are (see trust meter in Figure 13.2): 


— “Poor” symbolised with a sad-looking emoticon and red background colour 
(if there are negative evaluation results, i.e. the side is blacklisted or appears 
on alert lists); 

— “Good” symbolised with a happy looking smiley and green background colour 
(if there are no negative, but some positive results, i.e. the side has a seal or 
supports PrimeLife functions and is not appearing on black/alert lists); 

— “Fair” symbolised with a white background colour (for all other cases, i.e. the 
side has no seal, does not support PrimeLife functions, and does not appear 
on black/alert lists). 


e Make clear who is evaluated - this is especially important, because as we men- 
tioned above, our previous usability tests have revealed that users often have dif- 
ficulties to differentiate between user and services side [PFHD*05]. Hence, the 
user interface should make clear by its structure (e.g., by surrounding all infor- 
mation referring to a requesting services side, as illustrated in the “Send Personal 
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Data?” window Figure 13.2), and by wording that the services side and not the 
user side is evaluated. If this is not made clear, a bad trust evaluation result for 
a services side might also lead to reduced trust in the user side identity manage- 
ment system. 

e Structure the trust parameters visible on the second and third layers into the cat- 
egories “Business reliability (comprising the parameter “blacklisted”’) and “pri- 
vacy” (comprising the parameters of security & privacy alert lists, privacy seals 
and PrimeLife function support). This structure should illustrate that the trust pa- 
rameters used have different semantics and that scenarios with companies that 
are “blacklisted” for bad business practices, even though they have a privacy seal 
and/or support PrimeLife functions do not have to be contradictory, as they refer 
to different aspects of trustworthiness. 

e Inform the user without unnecessary warnings - our previous usability tests 
showed that extensive warnings can be misleading and can even result in users 
losing their trust in the PrimeLife system. It is a very difficult task for the systems 
designer to find a good way of showing an appropriate level of alerting: for in- 
stance, if a web vendor lacks any kind of privacy seal, this in itself is not a reason 
for raising alarm, as most sites at present do not have any kind of trust sealing. 
We also did not choose the colour “yellow” for our trust meter for symbolising 
such an evaluation result that we called “fair” (i.e. we did not use the traffic light 
metaphor), as yellow already symbolises a state before an alarming “red” state. 


13.3.3 Test Results 


Usability tests for three iterations of our PrimeLife trust evaluation function mock- 
ups were performed in the Ozlab testing environment of Karlstad University in two 
rounds with ten test persons each and one round with 12 tests persons. 

The positive results of the usability tests can be summarised as follows: 


e Most participants seemed to understand the “Send Personal Data?” user inter- 
faces and presented top-level trust evaluation results quiet easily. They thought 
that the UI was explicit and clear, with no distracting objects. The participants 
liked that the requested data were presented to them explicitly in “Send Personal 
Data?” before they decided to send their data or not. 

e The “Good” and “Poor” emoticons on top level were also clearly understood by 
all users. Only the “Fair” emoticon (which we called “Not bad” and “OK” in the 
first two test rounds) was by some test participants interpreted as confusing. 

e The colours red and green in the prototype (both on icons and over text) were all 
understood correctly by the participants. 

e The icon for alarming the users was also correctly understood. This was not fur- 
ther evaluated in test 2 and 3. 

e As many as 14 out of the 20 participants in test 1 and 2 liked the function they 
tested to be called “Trust Evaluation.” 
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Trust Evaluation - PrimeLife 0.2 


3% Trust Evaluation Result 


Evaluated Site 


WeLOVe Moyes! www.welovemovies.com 


has been evaluated according to your trust settings. 


Summary Result 


Detailed Result 
Privacy Reliability: 
Not mentioned in security & privacy alert lists 


TRUSTe 


@ 
i TRUSTe Seal 
Credentials 


Automatically Readable Privacy Policy 
Privacy-Enhan A ntrol 


User Obligation Management 
Eunctions for Exercising Rights 


Business Reliability: 
= Not on selected blacklists 


Collapse 
(Hide Complete View) 


Fig. 13.3: Complete view of the Trust Evaluation Function (3rd Layer). 


e All but one participant in Test 1 and 2 said in the interviews that they would like 
to use a PrimeLife prototype including a Trust Evaluation function that is similar 
to the one that was tested. 

e Nearly all participants understood that the services side, and not the user side, 
was evaluated. 


However, the tests also revealed a couple of usability issues that need to be ad- 
dressed by our next iteration of mockups: 
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e The more detailed trust evaluation results on the second and third layer were 
harder to understand for most test persons. The most difficult evaluation result 
to interpret for the test participants was the detailed “Fair” evaluation result, sev- 
eral participants stated that the website was “not evaluated,” “No evaluation is 
performed since everything is neutral”. 

e There has also been some confusion for some test users on how trust evaluation 
can work if the services side is not PrimeLife enabled. Hence, the users need to 
be better informed via the interface or other means that information about the 
services side’s trustworthiness can mostly be obtained from third parties and do 
not require PrimeLife support at the services side. 


Moreover, it is interesting to note that some participants took a bad trust evalu- 
ation result on the parameter “Blacklisting” more serious than on “Privacy alerts.” 
One comment from an interview with a test person was: “Alerts are warnings, but 
when you are blacklisted then it is really serious - this makes you think twice before 
sending my data.” 


13.4 The Data Track 


The design of the Data Track is motivated by the finding that transparency and 
reversibility of transactions will enhance user trust (see above). The idea of the Data 
Track is to make it possible for users to get an overview of the data that they release 
in different web applications and under what conditions these data was released. The 
Data Track should also help the users in accessing remotely stored data about them 
and to make it possible to request or perform changes and deletion of this data. Thus, 
in essence the Data Track has two functions, a history function and a transparency 
function. The Data Track itself is in essence a viewer that draws information from 
three databases (see Figure 13.4). The different databases in turn store information 
on the PII (personally identifying information) sent, the sessions in which the data 
is sent and the changes that have been made to the data. The Data Track itself does 
not initially store the data in the PII and the session database when it is first sent, but 
rather relies on some other entity to do the storage. In our initial setup, the PRIME 
Core from the PRIME project stores the PII and session information when it is sent 
over the Web (see Figure 13.4). The Data Track does, however, modify the data and 
stores the changes to the data if the user requests changes and deletion. For a more 
detailed description please see [PrilOc]. 

As people engage in many transactions that involve multiple providers simulta- 
neously, the implementation of a usable Data Track is difficult from an HCI per- 
spective. Providing users with easy search tools for finding relevant records about 
past data disclosure is one example. Showing users that they can access their data 
on remote servers is another, which is especially difficult as users have little or no 
experience with this type of online access functions. 

Within the scope of the PrimeLife project, we have conducted work for enhanc- 
ing the functionality and usability of the Data Track. For this, we have specially de- 
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Pil DB 


Data Track server 
side Webservices 


Change, Delete and retrieve request 


Fig. 13.4: Conceptual view of Data Track system as part of the PRIME core. 


veloped and tested user-friendly search functionalities plus Online functions, which 
now allow end users to access, correct or delete their data at the remote services side 
(as far as permitted). For the development of a usable Data Track, we have followed 
an iterative design based on a cyclical process of UI prototyping, usability testing, 
and refinements of the Data Track user interfaces. 


13.4.1 Use of the Data Track 


When a user opens the Data Track, she will be confronted by a list of sessions in 
which she has released data. This view contains functionality for searching and fil- 
tering on specific data given in the sessions. Clicking on one of these sessions will 
open up a session window (see Figure 13.5). The session window will give the user 
information on what type of information, the actual value used, for what purpose and 
to whom the information was sent. By pressing the policy button (see Figure 13.5) 
the user will also get information on the agreed-upon privacy policy for this session. 
This window also gives the user the possibility to retrieve or delete remote informa- 
tion (if this has been permitted by the data controller). By pressing the “retrieve data 
from X” button, the user instructs the Data Track to contact the service’s web page 
through the “Data Track server side Webservices” and retrieve the data stored under 
this session in the Session and PI Database (DB) on the remote side (1.e the data 
that the data controller has on the user for this session) (see Figure 13.4). If the user 
presses the summary button, a Summary window will pop up (see Figure 13.6). This 
window contains similar information as the session window but displays all the in- 
formation that the user has sent during all sessions with this specific data controller. 
The window also contains more fine-grained functions for deleting and changing 
specific values. In Figure 13.6, the “Retrieve All Data from X” has already been 
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Privacy Policy 


Organization: Unknown 
Street: Unknown 

City: Unknown 

Country: Unknown 

URL: http://www.skandia.se 
Date: 2009-06-24 13.43.00 


Retrieve data Delete all data 
from Skandia from Skandia 


Vainstein 
621221-6200 
Inga 


Fig. 13.5: Session Window in the Data Track. 
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pressed and the remote information is shown as well as the locally stored informa- 
tion. Any mismatch (such as different, missing or non-sent data stored on the remote 
side) between the sent and the remotely stored data will be marked, so that it can be 
easily spotted by the user (see Figure 13.6). This type of marking also takes place 
in the Session window but only for the data belonging to the same session. 


Contact information. 


Name: Skandia 

Organization: Unknown 
Street: Unknown: 

City: Unknown 

Country: Unknown 

URL: http://www.skandia.se 
Date: 2007-03-22 - 2009-09-15 


| Retrieve data Change data Delete data 
from Skandia at Skandia at Skandia 


Data Sent Verifier Sent Remotely Stored Data Remote Stored Verifier Time Stamp 

Search ¥ [Search ¥ [Search [Search [v7 [Search ¥ [Search 

IY Identifier 621221-6200 Transportstyrelsen 621221-6200 Transportstyrelsen 2008-05-26 19.19.00 
Identifier 62121-6200 Transportstyrelsen 621221-6200 Transportstyrelsen 2009-06-24 13.43.00 
Identifier 62121-6200 Transportstyrelsen 62121-6200 Transportstyrelsen 2009-09-15 16.04.00 
Identifier 621221-6200 Transportstyrelsen 621221-6200 Transportstyrelsen 2007-03-22 17.12.00 
Official family name Vainstein Vainstein 2008-05-26 19.19.00 
First name Inga Inga 2008-05-26 19.19.00 
Password ingal221 ingai221 2009-06-24 13.43.00 
Professions Journalist Journalist 2007-03-22 17.12.00 
Street Lingonstigen 8 Lingonstigen 8 2007-03-22 17.12.00 


Fig. 13.6: Summary Window in the Data Track. 


13.4.2 Test Scenarios & Test Setups 


In total, we have performed 5 usability test rounds at Karlstad University and at 
CURE with a total of 58 participants involved. The test participants were aged be- 
tween 19 and 56 years, 32 were male and 26 were female. All participants, except 
one, use the Internet on a daily basis. All participants shop online at least once a 
year. Most users stated they shop online once or several times a month and a few 
users stated they shop online once or several times a week. 

The general purpose of all the tests was to evaluate the users’ comprehension 
of the Data Track UI. The first test was a pilot test to validate the test set up and 
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procedure, which was followed by three rounds of tests that differed only in regards 
to the amount of instructions given. The last test round was a combined test where 
the participants were asked first to use the Credential Selector UI (see Chapter 12) to 
perform a transaction followed by the Data Track test. This was done in order to see 
whether users would get a better grasp of the application if they got to experience 
both sending data and reviewing stored data first hand. 

All tests followed the same procedure except in regards to the pre-test instruc- 
tions. A test session lasted between 30 to 60 minutes and contained the following 
parts: 


Oral and written information about the test in general 

Pre-test questionnaire 

Pre-test introduction 

Pilot and first round of tests: instruction movie 

Second round of test: very short oral presentation after which users were given a 
few minutes to click around as a familiarising task 

e Third round (Data Track and Credential Selection combination) as above but 
with additional information regarding the Credential Selection mock-ups. 

Test person reads task information and interacts with prototype 

In the third round (Data Track and Credential Selection combination), the par- 
ticipants were asked to use the Credential Selection UI to purchase a book from 
Amazon.com both before and after using the Data Track. 

Post-test questions 

Online Post-test PET-USES questionnaire 

Discussion about the given answers 


During the test, users were asked to perform 15 different tasks', which together 
would answer our questions regarding local search and online access functions. 
More specifically we wanted to know if users would: 


Understand how to search within the tables? 

Understand how to add columns to the main table? 

See the sort function of the main tables? 

See the expand function of the tables? 

Understand how to open the “summary card”? 

Understand how to update information via the “summary card”? 


13.4.3 Results of the Usability Tests 


First of all, the results of the usability tests show that the usability of the Data Track 
is in a mature state. In general, most of the users succeeded in solving most of the 
tasks (see Figure 13.7). 


' A more detailed description of the user tasks and the results of the tests can be found in [Pril 0c]. 
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Fig. 13.7: The results of task 1-15 in regards to the participants completing the task 
correctly, adding information or giving less information than the correct answer 
required. 


Having said this, it should be noted that users did not always accomplish this in 
an optimal way and that there is still, as always, room for improvement. Also, the 
tests showed a number of usability bugs that will not be reported here. 


13.4.3.1 Results Regarding the Search Functionality 


Given the fact that the idea is that the Data Track should be used for a very extensive 
period of time, users will aggregate an abundance of data. Hence, if users cannot 
successfully search through their data, the Data Track will not be of much use. 
The main issues with regards to the table UI was the interpretation of the summary 
line. Users either interpreted it as an occurrence, which made them overestimate the 
amount of times they had used a specific e-mail address, or missed the fact that it 
was expandable and thus instead underestimated their usage of a given piece of data. 
Other common mistakes were based on the understanding of the functionality of the 
columns heading, where some users did not understand that clicking the headers 
would sort the table and nearly no users realised that it was possible to add columns 
with specific information. The latter problem was leading users to open all summary 
cards and manually counting specific occurrences - something which is obviously 
not feasible after a couple of years of usage. 
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13.4.3.2 Results Regarding the Online Access Functionality 


The key features of the online access functionality are the possibility to remotely 
access and alter one’s own personal data at the services’ side. In order to do so, a 
test user was asked to retrieve remotely stored data, evaluate it, and proceed with 
correcting errors if such are found. A number of user errors during these tasks can 
be seen as spillover effects from other areas, such as users not correctly understand- 
ing the difference between the summary cards and transaction cards and users not 
understanding that the issuer of verified data is sent together with the verified per- 
sonal information. However, the major issue of the online access functionality is that 
users do not clearly discriminate between data that is stored locally and remotely. 
Thus, instead of retrieving remotely stored data and evaluating it, the majority of 
users rather looked at a transaction card. Although this shows the user what data 
has been sent in a given transaction and gives the user the possibility to change data 
(e.g. in case of changed address), it does not let the user see a summary of all data 
aggregated by a service from multiple transactions and in the worst case from other 
sources. 


13.4.4 Discussion of Data Track Usability Tests 


On a general level, the results of the usability tests of the Data Track show that, with 
some exceptions, users have little trouble navigating the Data Track and finding 
information that is stored locally. Especially noteworthy is the use of the summary 
card, which all users understood correctly, and the table search function, which was 
also widely understood. The users’ problems with the Data Track can be divided 
into two areas, namely UI problems and mental model problems. 

With regards to UI problems, the main issue is the summary rows in the tables. 
The idea of the summary row is to show that the user has sent information to a given 
recipient. However, the problem is that users often do not understand that this is a 
summarising heading of possibly multiple attributes and that only the last value is 
being shown. This results in users not expanding the row and thus missing a lot of 
information that has been sent to the recipient. Quite the opposite has also occurred, 
namely that users have interpreted the summary row as a separate transaction mak- 
ing them overestimate the amount of data they have sent to the recipient. 

With regards to issues based on users’ mental models, the key problem is that 
users often do not distinguish between service and client side. This results in users 
not retrieving data from the service side in order to verify what information they 
have stored. Thus, tasks where the users can see the incorrect data locally have been 
satisfactorily solved, while tasks that depend on users retrieving remotely stored 
data have been more difficult. 

Lastly, the combined tests did not show any reliable effect on users’ understand- 
ing of the Data Track. However, three out of eight participants who overestimated 
the amount of data they had sent to Amazon with the Credential Selector actually 
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understood what they had actually sent after they had used the Data Track. Thus, 
using applications such as the Data Track and Credential Selector in combination 
helps users get into the right mental model. 


13.5 Conclusions 


In this chapter, we presented the HCI work on two different functions that we have 
developed within PrimeLife for enhancing reliable trust by end users, namely the 
trust evaluation function and the Data Track. 

The tests of the trust evaluation function clearly showed that such a function is 
much appreciated by end users. The presentation of overall evaluation results on 
a top level, especially the green and red emoticons, as well as the fact that the ser- 
vices side was evaluated, were well understood. Some users had problems, however, 
understanding the “neutral” evaluation result (in case a side has no seal, does not 
support PrimeLife functions, is not blacklisted and does not appear on alert lists). 
Hence, the illustration of “neutral’’ results is one of the most difficult issues and still 
needs to be investigated further (see also [Pri09b]). 

The results of the Data Track usability tests show that users have little trouble 
using most parts of the Data Track that concern locally stored data. With regards to 
locally stored data, it is mainly parts of the table UI that needs to be improved. A 
more challenging issue that needs further improvement is the problem of conveying 
to the users what is happening on the client side versus what is happening on the 
service side. 
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Abstract The PrimeLife Policy Language (PPL) has the objective of helping end 
users make the data handling practices of data controllers more transparent, allow- 
ing them to make well-informed decisions about the release of personal data in 
exchange for services. In this chapter, we present our work on user interfaces for the 
PPL policy engine, which aims at displaying the core elements of a data controller’s 
privacy policy in an easily understandable way as well as displaying how far it cor- 
responds with the user’s privacy preferences. We also show how privacy preference 
management can be simplified for end users. 


14.1 Introduction 


PrimeLife aims at developing privacy-enhancing identity management systems for 
technically enforcing user control and information self-determination. An impor- 
tant prerequisite for user control in the context of privacy-enhancing identity man- 
agement are privacy policies, which can inform users about the personal data pro- 
cessing practices of a data controller at the time when she is requested to disclose 
personal data to that data controller. According to Art. 10 EU Data Protection Direc- 
tive 95/46/EC (DPD), a privacy policy should inform a data subject at least about the 
identity of the data controller, the purposes for which the data are intended as well 
as any further information such as the categories of data and recipients concerned, 
her right of access to and the right to rectify her data, needed to guarantee fair per- 
sonal data processing. However, in practice, privacy policies, whether posted on 
websites or contained within contractual texts, often include long complicated legal 
statements, which are usually neither read nor understood by the end users. Mak- 
ing privacy policies easily understandable and transparent is therefore an important 
challenge. In order to address this challenge, one emphasis of our HCI work within 
PrimeLife has been on privacy policy icons that support the display of a privacy 
policy through specially tailored graphical representations of policy aspects, which 
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will be discussed in the following chapter. Gross et al. [GA05] have shown that the 
perceived clarity of a privacy policy increases positive reaction to the site and its 
goals. Hence, easily comprehensible and transparent privacy policies are not only 
a means for enhancing user control, but can also serve the interests of the service 
providers. 

Achieving better transparency of privacy policies is also the aim of privacy policy 
matching to allow users to make an informed decision on whether to use the service, 
as it is enabled by PPL (see Section 17.2.1 of this book). In this context, a user can 
state her privacy preferences to define under which conditions she would like to 
release what data. The user’s preferences can be compared to the data controller’s 
policy, so that the user can be informed in case her privacy preferences will not be 
met. 

For ordinary users, defining and adapting privacy preferences in such a way that 
they protect their privacy properly are complex and error-prone tasks that usually 
require some expertise about basic legal privacy concepts and principles. In the 
non-electronic world, equivalent tasks do not directly exist, which means that or- 
dinary users have no experience in defining and managing their privacy preferences. 
Without assistance, most users would very likely fail to define and use privacy pref- 
erences at all or could accidentally define or choose privacy preferences that are not 
as privacy-friendly as the users would like them to be. As security and privacy pro- 
tection are often secondary goals for ordinary computer users [WT99], it is indeed 
not realistic to assume that users will spend much time and effort on privacy config- 
urations. Hence, another major challenge, which we have addressed in PrimeLife, 
is the simplification of privacy preference management for end users. 

For achieving this, our approach in PrimeLife has been to provide options of pre- 
defined “standard” privacy preferences, from which a user can choose and which 
she can customise “on the fly.” If, for example, a data controller requests more data 
for a service than permitted by the user’s current privacy preferences and the user 
agrees to it, the user will at the same time be asked whether she wants to adapt 
her preferences and possibly save them under a new name or whether she wants 
to overrule her preferences only for this single event. The set of predefined privacy 
preferences should represent the users’ privacy interests and thus also includes the 
most privacy-friendly options for acting anonymously or for releasing as little infor- 
mation as needed for a certain service. For more advanced users, a preference editor 
is provided, which allows them, in a user-friendly way, to configure their individual 
privacy preferences. 

For the development of our user interfaces (UIs), we have followed an iterative 
development approach with five iterations of UI development, testing and UI refine- 
ments and improvements. The emphasis of this chapter will be on the UIs developed 
within the fourth and fifth iteration cycles. 

In this chapter, we present our work on user interfaces for the PPL policy engine, 
which aims at making privacy policies easily understandable and transparent for end 
users and at simplifying privacy preference management for them. The remainder 
of this chapter is structured as follows. Section 14.2 will discuss related work and 
refer to our previous HCI work in PrimeLife. In Section 14.3, we will present policy 
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management and display user interfaces for the PPL policy engine that we have 
developed in PrimeLife as part of the latest (4th and 5th) development iteration 
cycles. The results of usability tests of these UIs will be presented in Section 14.3.3 
and conclusions will be drawn in 14.3.4. 


14.2 Related Work 


For making privacy policies and their core elements better understandable and more 
transparent, the Article 29 Data Protection Working Party has recommended provid- 
ing policy information in a multi-layered format. Three layers of policy information 
are suggested: The short privacy notice (layer 1) must offer individuals the core in- 
formation required under Art. 10 EU Directive 95/46/EC, which includes at least 
the identity of the controller and the purpose of data processing. In addition, a clear 
indication must be given as to how the individual can access additional information 
(of layers 2 and 3). The condensed notice (layer 2) includes all other relevant infor- 
mation required by Art. 10 of the Directive such as the recipients, whether replies to 
questions by the data controller are obligatory or voluntary, and information about 
the data subject’s rights. The full notice (layer 3) includes in addition to layers 1 and 
2 also “national legal requirements and specificities.” 

Our UI prototypes for policy display conform to the Art. 29 Working Party Rec- 
ommendation. 

Recent work on a “Nutrition Label” for privacy [KBCRO9], has made proposals 
on how to present information to be displayed in short privacy notices in a user- 
friendly manner, namely the types of information to be collected, how this informa- 
tion is used and with whom it may be shared. In particular, a visualisation technique 
for displaying policies in a two-dimensional grid with “types of information” that 
are requested as rows and purposes as columns was developed and well received by 
test users. Our policy display UIs use a two-dimensional table presentation, which 
is similar to the proposed grid, for summarising what data is released to whom and 
for what purposes. We have, however, adapted it to meet PPL-specific requirements 
and have done further changes, which will be discussed in Section 14.3.2.1. 

Moreover, there has also been previous work on the usability of P3P! user agents 
[CGA06], and the means for mediating information of P3P privacy policy compli- 
ance by websites to end users via the the Privacy Finder P3P-enabled search en- 
gine service [GECA06, TERS*T06]. This related work on user interfaces for P3P 
and P3P-enabled tools does, however, not address requirements that can be derived 
from the EU privacy legislation. The Privacy Bird? is a P3P user agent that allows 
the user to specify her privacy preferences regarding a website’s data handling pol- 
icy. The privacy bird uses the traffic light metaphor for displaying information about 
the compliance of a site’s policy with the user’s preferences: If a site’s policy meets 
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the user’s preferences, a small green bird icon in the browser’s title bar emits a happy 
tweet after the page has been loaded. If the site violates the user’s privacy prefer- 
ences, the bird icon turns red with a shrill warning when the page is first loaded. 
For sites with no P3P policies, a yellow bird will appear. It is however question- 
able whether the traffic light is the right metaphor in this context, because having 
no privacy policy (symbolised by the yellow bird) can actually be regarded as worse 
than having a policy not matching the user’s preferences (symbolised by the red 
bird). For allowing users to specify their privacy preferences, a set of three prede- 
fined groups of preferences is provided, which can be customised by the user during 
the installation process and via the privacy-bird menu. However, in contrast to the 
approach that we have taken, the privacy bird does not allow for changing privacy 
preference settings semi-automatically “on the fly.” 

Also, in contrast to the PrimeLife Policy Language (PPL), for which our user in- 
terfaces were designed, P3P has several functional restrictions, in particular it lacks 
support for obligations, support for downstream data sharing as well as support for 
anonymous credentials. Moreover, P3P has only a focus on one type of interactions 
(web pages, http). 

Our previous HCI work in PrimeLife included three earlier development cycles 
of mockups for policy display and management, which were presented in PrimeLife 
Deliverables D4.3.1 [Pri08] and D4.3.2 [PrilOd]. The mockups of the first three 
iteration cycles showed several usability problems, which led to some improvements 
in the mockups of the fourth and fifth iterations cycles presented below. Besides, 
they were not fully compatible with the PPL specification, as the specification was 
only available in the 2nd project year. 

As part of the third iteration cycle, a so-called PrimeLife Checkout Protototype 
(PLC) was developed, which included a “Data to Transfer” sidebar (DTT). It re- 
ceived very positive feedback when tested at Karlstad University, and will for this 
reason be briefly presented here (see Figure 14.1). The objective of DTT is to visu- 
alise a user-friendly overview of which data will be transferred to which data con- 
trollers for which purpose. The DTT consists of a box that displays a list of the data 
controllers involved in a transaction, each represented with an own section. Sections 
are separated by a horizontal line. Each data controller section begins with the data 
controller’s name, which is followed by a list of the data fields and the respective 
data values that will be transferred in the respective transaction. 

In addition, the purposes for the data storage/processing and the duration of data 
storage are displayed. At the end of each section, a hyperlink leads to the detailed 
privacy policy of the data controller in full text. The DTT has been implemented 
in JavaScript and is updated in real-time depending on what options were selected 
by the user. Besides the problem of not being PPL compliant, users easily got the 
impression that the PLC interface appeared as part of a webshop’s side rather than 
as part of the user’s side identity management system. This was another reason for 
starting with new designs in the 4th iteration cycles, which will be presented below. 
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-—Data to Transfer 


to Webshop 
(Purchase): 
Address Line 1: 
Parcelstation 101 
Postcode : 12345 
City - Mycity 
Country : EU 

E-mail : John@doe.eu 


Data will be processed 
and stored for the 


purpose of: 
© Tax - 10 years 


Detailed Privacy Policy 


to DHL (Shipping): 
Address Line 1- 
Parcelstation 101 
Address Line 2 - 
12345678 

Postcode = 12345 
City : Mycity 
Country : EU 

E-mail : John@doe.eu 


Data will be processed 
and stored for the 
purpose of. 

© Tax - 10 years 

© Delivery - 7 days 


Detailed Privacy Policy 


Fig. 14.1: The “Data to Transfer’ sidebar. 


14.3 User Interfaces for Policy Management and Display 


This section will present and discuss the user interfaces that we have developed for 
the PrimeLife Policy Language PPL, including UIs for selecting the user’s privacy 
preferences and for the “Send Data?” Dialog which displays policy related informa- 
tion and allows users to customise their preferences “on the fly.” We have developed 
these UIs within the fourth and fifth iteration cycles of our iterative design approach 
for Policy Management and Display UIs. 
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14.3.1 Selecting Privacy Preferences 


When users are browsing the Web, it is likely that they want to assign different levels 
of trust depending on what web site they are browsing or what they are doing. PPL 
allows the user to define privacy preferences that can specify what types of data they 
are willing to release under which conditions (in particular, for which purposes and 
- in future versions of PPL - for which retention period, etc.). As we cannot assume 
that ordinary users can easily define and manage their preferences, our approach in 
PrimeLife is to provide three predefined groups of privacy preferences (which we 
have called “Nearly Anonymous,” “Minimal Data,” and “Requested Data’), from 
which users can choose from when contacting a data controller, and which they can 
customised “on the fly”. In addition, the Privacy Tuner, which is a Policy Editor 
developed for PPL, allows the user to create further different customised preference 
groups. 

At each point in time, the user has exactly one active group of privacy prefer- 
ences selected. If the user has not actively selected any group, the predefined group 
“Nearly Anonymous,” which is the most privacy-friendly one, is selected by default. 
We refer to the user’s active group as the user’s active privacy preferences. 

Our user interfaces allow for changing the active privacy preferences via a book- 
mark list, via a menu accessible from the location bar or via the “Send Data?” Dialog 
window. The latter will be presented in the next section. 

Figure 14.2 shows the menu as part of the bookmarks list (“Favorites” list in 
Explorer), which can be used for selecting either one of the predefined or the cus- 
tomised preference group named “My Shopping.” Selecting a privacy preference 
group in the subfolder of a bookmark makes the browser go to the bookmark in 
question using the selected group as the user’s active privacy preferences. This ap- 
proach is an advanced version of the bookmarks-based approach for selecting pri- 
vacy preferences, which was first proposed in [PFHD*05, Pet05]. It is analogous to 
how folders of bookmarks are displayed, where the bookmark is treated as a folder. 

In Figure 14.3 the icon of the user’s active privacy preferences is shown in the 
location bar. By clicking the icon the menu is shown, allowing the user to change 
his active privacy preferences. Note that the location bar approach allows the user to 
change privacy preferences when already on a webpage, while the bookmark based 
approach is limited to locations bookmarked by the user. 

Since the main purpose of the user’s privacy preferences is to match these pref- 
erences with a data controller’s policy, the “Send Data?” Dialog described in Sec- 
tion 14.3.2 allows users to customise their settings “‘on the fly.” 

These three different ways of selecting the active privacy preferences provides 
users with easy means for changing their preferences before going to a website, 
while browsing the site and as part of the process of disclosing data to the website. 
Selecting privacy preferences from the bookmarks menu would change the location 
of the browser and the active privacy preferences. The icon would then be updated 
in the location bar to reflect the user’s new choice. Any later change to the privacy 
preferences in the location bar will override any selection made when browsing to 
the current site from the bookmarks menu. In the same way, any change to privacy 
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Bookmarks| Tools Help 

Bookmark This Page Ctrl+D 
Subscribe to This Page... 
Bookmark All Tabs... Ctrl+Shift+D 
Organise Bookmarks... Ctrl+Shift+B 


Bookmarks Toolbar » 


Recently Bookmarked r 
Recent Tags » 


Get Bookmark Add-ons 


2% BP of 


eBay - New & used electronics, cars, app... >| ¢ Nearly Anonymous 
© Minimal Data 
23] Requested Data 


“~My Shopping 


Open Editor 


Fig. 14.2: The menu as part of the bookmarks list. 


File Edit View History Bookmarks Tools Help 


e- [ey AY http://weww.ebay.com/ Oxy ~ LISB~. Googie 


«oY eBay - New & used electronics, | + € Nearly Anonymous 
= Minimal Data 


i My eBay | Sell | Communit Requested Data 
Welcome! Sign in or register. My Shopping 


All Categories a} adver __0P@0 Edtor 
CATEGORIES v FASHION MOTORS STORES DAILY DEAL @ eBay 8B Protection re 
ne eBay assess The free and friendly way to buy and sell locally Tip hears 


Fig. 14.3: Icon of the user’s active privacy preferences is shown in the location bar. 


preferences in the “Send Data?” Dialog will override any selection made in the 
bookmarks menu or location bar. Trying to access the location bar with the “Send 
Data?” Dialog opened will result in the “Send Data?” Dialog being updated as well. 


14.3.2 The “Send Data?” Dialog 


The “Send Data?” Dialog appears in situations when a data controller requests a 
user to disclose personal data, for instance completing an online shopping transac- 
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tion. The dialog informs the user about the data controller’s privacy policy and how 
well it matches the user’s privacy preferences. It allows the user to either give her 
consent for sending personal data or to cancel the transaction. Moreover, it provides 
functions for the “on the fly” customisation of groups of preferences. 

It should be noted that while the technical P3P and PPL vocabularies use the term 
“privacy preferences,” we use in the “Send Data?” Dialog UIs the term “privacy 
settings” for the same concept, as we think that “preferences” is not the best term 
for conditions chosen or set by the user, which are not optional but that should be 
binding and can only be explicitly overruled by the user herself. Furthermore, tests 
of the comprehensibility of privacy terminology conducted by PrimeLife partner 
CURE showed that the term “privacy settings” was understood by the majority of 
test participants (5 out of 8), while privacy preferences was only understood by a 
minority (2 out of 14). 

The “Send Data?” Dialog is one of the main user interfaces for the PPL engine 
and has to fulfill several PPL-specific requirements as well as legal and HCI-specific 
requirements, which are all listed below: 


e Displaying the data controller’s policy in an easily comprehensible manner as a 
basis for obtaining the user’s informed consent to data disclosures. For achieving 
this, we are in particular following the Art. 29 Working Party’s recommendation 
of displaying policies in multiple layers. 

e Assisting the user in the process of selecting one combination of credentials that 
can be used to fulfill a data request by a data controller (as specified in a resource 
policy). The PPL engine provides the UI with all possible combinations of the 
user’s credentials that can be used for the disclosure in question; it is up to the 
user, with the aid of the UI, to select which combination to use. 

e Allowing the user to fill in attribute values for uncertified attributes. A resource 
policy can require both certified and uncertified attributes to be disclosed. For 
certified attributes, the values can be found in the credentials selected by the 
user. Uncertified attributes on the other hand can have any value the user wants 
to. A good example of a usually uncertified value is the user’s email address. 

e Providing the user with documentation and feedback information on the different 
aspects of the interface that will help clarify their intentions. Since the concept 
of online privacy is not simple to understand, it is at times necessary to assist the 
user in an unobtrusive manner. 

e Making the user understand that the dialog handles information on the client’s 
side and it is not part of a service provided by a data controller. During previous 
HCI work, we detected that users often have difficulties in differentiating the user 
side from the data controller’s side [PFHD* 05]. 

e Clearly displaying policy mismatches in an informative but not too alarming 
manner, so that users will make rational decisions on how to proceed. In case 
of a mismatch, the option to overrule the user’s preferences for the current trans- 
action only or for all future transactions shall be offered to the user. The second 
option requires the user to customise her current profile of privacy preferences 
accordingly “on the fly.” 
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Moreover, PPL has the following properties that need to be taken into consider- 
ation: 


e Personal data attributes requested for certain purposes cannot be marked as op- 
tional. Opt-ins for the use of attributes for certain purposes (e.g., use of an email 
address for marketing purposes) has to be done on the data controller’s side be- 
fore the PPL request for personal data is sent to the user. If the user opts in at 
a site, the data request from that site will include the attributes for the purposes 
that the user opted in. 

e It is possible to specify that data will be also forwarded to a third party (down- 
stream data controller), but the identity of that third party cannot be specified. If 
however, a site requests data, which should only be used by another site acting 
thus as the data controller, and if the data is therefore encrypted with a key of 
that data controller, and the site requesting the data forwards the encrypted data 
directly to that data controller, then it is possible to specify the identity of that 
data controller in PPL (e.g., if a web shop requests payment data encrypted with 
the public key of a payment provider, then the identity of the payment provider 
can be specified, who will act as a data controller). 


14.3.2.1 Meeting Requirements with the Design of the “Send Data?” Dialog 


The list of legal, implementation-specific and HCI requirements given above deter- 
mined to a great extent the development process and design decisions of the “Send 
Data?” Dialog. The user interface of the fifth iteration cycle, for which we conducted 
usability tests, is shown in Figure 14.4. 

The following paragraphs explain in closer detail how this design of the “Send 
Data?” Dialog meets the specified requirements. 

The dialog is divided into an informative top panel and other three main panels 
conveying different privacy related information to the user. The purpose of the top 
panel, with the title ‘Why am I seeing this dialog’, is to describe the motivations of 
the dialog and how it can help the user protect her privacy. 

The first main panel, with the title ‘Data requested by’ as shown in Figure 14.4, 
lists all data controllers that are requesting some kind of information for a particular 
transaction. Along with the data controller’s name, the user can find the data con- 
troller’s contact information and a direct link to its full privacy policy (in order to 
fulfill the recommendation of Art. 29 Working Party). Data controllers are preceded 
with a circled number, such as ®, ®, ®, etc., in order to create a visual connection to 
their representation inside the two-dimensional table found under the second panel. 

The second panel, titled ‘Your data is requested for the following purposes’, con- 
tains the table shown in Figure 14.5 which corresponds to the simplified Grid of 
a Privacy Nutrition Label presented in [KBCRO9]. This way of presenting privacy 
policies in the form of a nutrition table was encouraged due to the positive user un- 
derstanding results reported by [KBCRO9]. In the “Send Data?” Dialog, by looking 
at the nutrition table the user can perceive what types of personal data are requested 
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Why am I seeing this dialog? 


This dialogue provides information about the privacy policies of services sides requesting personal data and shows how far these 
policies are compliant with your privacy settings. You have the choice to either consent to the disclosure of the requested personal 
data under these policies or to cancel the transaction. 


Data requested by 7) 
1) eBay Inc, checkout (www.ebay.com, contact@ebay.com) view Full privacy policy 
2) DHL International (www.dhl.com, contact@dhl.com) view full privacy policy: 


Your data is requested for the following purposes 7) 


Admin Contact Delivery Marketing Payment 


Address - Certified By: 
(@ Passport [Belgian] - http: //www.F... =| 


> »D F121 


Cardnumber - Certified By: 


© Visa Credit Card [My private card]... \y) =| —| —| =| 
>a 


Expirationdate - Certified By: 


@ Visa Credit Card [My private card]... ¥ —| — =| =| u 
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Privacy policy matching 7) 
M ism atch x Found mismatches: My current privacy settings: 


- Obligation mismatch. | € Nearly Anonymous ¥| 


CO Accept mismatch 


for this transaction only be 


Cancel 


Fig. 14.4: The “Send Data?” Dialog. 


in the table’s rows, and for what purposes these data will be used in the table’s 
columns. 

There are several differences in the way the nutrition table is used in the “Send 
Data?” Dialog compared to the one suggested by [KBCRO9]. First of all, the “Send 
Data?” Dialog allows displaying policies of more than one data controller request- 
ing personal data. Our design of the “Send Data?” Dialog uses similar icons as in 
[KBCRO9] inside each cell to indicate if data is requested for a specific purpose &) 
or not (&). In addition, the “Send Data?” Dialog uses further icons as cell entries 
to show which data controller (represented by the circled numbers) is requesting the 
data (MIM). The user is also able to see if the privacy policy specifies that her per- 
sonal data will be forwarded to a third party for a specific purpose (I), whereas the 
nutrition table presented in [KBCRO9] only allows for specifying that data will be 
shared with third parties with the help of extra columns but without restricting the 
purposes for which the data may be shared or forwarded. Furthermore, in contrast to 
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[KBCRO9], combo-box controls are used in our UIs to let the user select certifying 
credentials or freely type in uncertified credentials. Moreover, the graphical icons 
used to indicate whether data is being requested or not were also modified to better 
fit the purposes of the “Send Data?” Dialog. A more detailed description of these 
differences can be found in [Pril0Od]. 

The use of this two-dimensional table aims at satisfying the requirement of pro- 
viding the user with a more understandable summary of the data controllers’ privacy 
policies, while at the same time following Art. 29 Working Party’s recommenda- 
tions of displaying only core policy information on the top layer of a multi-layered 
structured policy display. 

The design of the credential selection for the “Send Data?” Dialog was based on 
a card-based approach as described in Chapter 11.5. In this approach a plastic card 
metaphor is used to represent the attributes that can be contained in, for example, 
a typical credit card or identity card. The requirement imposed by the disclosure 
of certified and uncertified attributes is fulfilled by employing different interface 
controls. By using textbox controls, the user is given the opportunity to enter any 
value for attributes that are uncertified. With the use of combo-box controls, the 
user is obliged to select a credential that certifies the attribute that is about to be 
sent and cannot be marked as optional. Furthermore, the user can see, but not edit, 
the values corresponding to the certified credentials that are displayed inside non- 
editable textboxes. 


Your data is requested for the following purposes ) 


Admin Contact Delivery Marketing Payment 


Address - Certified By: 
(@ Passport [Belgian] - http://www.F... | 
; >D YD XD2>1» 


Cardnumber - Certified By: 


© Visa Credit Card [My private card]... v | | =| —| 
>a 


Expirationdate - Certified By: 


@& Visa Credit Card [My private card]... — — = — 
>1 
Email: 
HHS = 
>» » 


>12>2> 


< 


< 


: Ava: > Data will be sent to 
Data is requested —| Data is neither requested nor sent > Forwer Been eaty 


Fig. 14.5: The “Send Data?” Dialog — Your data is requested for the following pur- 
poses. 


In the bottom panel, with the title ‘Privacy policy matching’, the user is given 
visual feedback on whether her privacy settings (i.e., her current preferences) match 
the data controller’s privacy policy or not, as shown in Figure 14.6. This feedback 
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is provided in the form of an indication icon based on a metaphor of two puzzle 
pieces fitting together, one representing the user’s settings and the other one repre- 
senting the data controller’s privacy policy. If the pieces do not fit, it serves as an 
indication that the user’s settings and the data controller’s policy do not match. The 
icons are enhanced with text to clarify their meaning to the users, and are given the 
titles “Match ¥” or “Mismatch X,” and each puzzle piece is named “Settings” and 
“Policy.” Furthermore, the user is given a list of found mismatches and the reasons 
why a mismatch occurs. These images representing a “Match “” or “Mismatch X” 
fulfill the requirement of clearly displaying policy mismatches in a not too alarm- 
ing manner, since the user is able to perceive that something is not right, but not 
shockingly wrong, while at the same time feedback is given on what is creating the 
mismatch. 

At the same time, the user is given the possibility to adapt her current privacy set- 
tings as the transaction takes place, thus satisfying the requirement for “on the fly” 
privacy settings management. If a mismatch exists, the user is still able to proceed 
with the transaction if she consciously accepts the mismatch and explicitly acknowl- 
edges that her current privacy settings are overruled. Additionally, she is given the 
possibilities to overrule her settings for the current transaction only, to update her 
settings for future transactions, or to adapt her settings for future transactions and 
save them under a new name. 


Privacy policy matching 7) 


H Found mismatches: My current privacy settings: 
Mismatch x - Obligation mismatch. | © Nearly Ancenmious _ 


[i 


Accept mismatch 


for this transaction only 4 


Cancel 


Fig. 14.6: The “Send Data?” Dialog — Privacy policy Matching. 


Implementing the “Send Data?” Dialog as a Firefox 4 add-on has the advantage 
of forcing the dialog to ‘pop-up’ on top of the browsed page, due to the browser’s 
new notification system. Therefore, when the “Send Data?” Dialog appears, the 
website behind it is dimmed, as shown in Figure 14.7, thus taking the focus of 
the user away from the website and placing it on the dialog. In this way users could 
have a clearer understanding that their information and privacy settings are being 
managed on their side and not on the data controller’s side, since it is made clear to 
the user that the dialog belongs to the browser, which is installed locally. 
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Fig. 14.7: The “Send Data?” Dialog — The website is dimmed as the dialog opens 
on top. 


14.3.3 Testing the Usability of the “Send Data?” Dialog 


As discussed earlier in this chapter, the concept of online privacy management and 
the adjustment of privacy settings is hard to understand for the average user, since 
there is no equivalent metaphor in an offline world. This created a challenge when 
carrying out usability tests for the proposed design of the “Send Data?” Dialog 
presented hereby. Special considerations were placed on the way test participants 
were introduced to the test sessions and the explanation they were given about the 
dialog before each session started. 

Twelve test sessions were carried out at Karlstad University with participants 
coming from different occupations and social backgrounds (three coming from Asia, 
one from Africa, and eight from different European countries), out of which seven 
were females and five were males ranging from 19 to 34 years old. In each test ses- 
sion the participants were introduced to the concept of the “Send Data?” Dialog. 
They were shown the dialog for some time and were asked to respond to predefined 
interview questions while they interacted with the dialog. Notes were taken and, 
during some of the test sessions, the computer screen was recorded as well as the 
participant’s eye-gaze. At the end, the participants were asked to fill in a question- 
naire containing selected PET-USES measurements, as presented in Section 10.4. 

The general observations from the tests sessions and other expert evaluations, 
along with some suggestions for improvement, can be summarised in the following 
points: 


274 J. Angulo, S. Fischer-Hiibner, T. Pulls, U. Konig 


e Participants in general understood that their information was not yet sent at the 
moment the “Send Data?” Dialog appeared, and that they had the opportunity 
of cancelling the transaction without compromising their information on the in- 
ternet. What is more, participants understood that the program was not a service 
provided by the data controller but that all their information was handled locally, 
until the moment the Send button was pressed. 

e Eye-tracking data showed that participants usually skipped the information text 
provided to them on the top panel. Many of them confessed that they either did 
not read it at all, that they read it but did not understand it, or that they read it only 
when they did not understand something else and were hoping to find answers 
on it. No participant tried to look for information on the question mark icons, @, 
even when they were having trouble answering one of the interview questions. 
It has been suggested to remove the top panel for the next iterative cycle of the 
dialog. 

e The test participants often failed to quickly recognize the connection between the 
circled numbers shown besides the names of the data controllers and the circled 
numbers in the two-dimensional table. Therefore, it was hard for the participants 
to understand who was requesting their data. However, after some time of fa- 
miliarizing themselves with the interface, participants noticed this connection 
and most of them were able to assert which data controller was receiving which 
information. Representing data controllers with colors has been suggested as a 
way to make this connection more visible and understandable to the users. Also, 
placing the list of data controllers below the nutrition table and nearer to the 
other table’s legends might increase the visual perception and understanding of 
its relation. 

e Most participants believed that the two-dimensional nutrition table represented 
the extent to which their privacy settings matched the data controller’s privacy 
policy. During the test sessions, participants tended to click on the red plus icon, 
+ | and they expected that the icon would change its looks into a more positive 
state. One participant even commented that “it [the icon] looks like a button.” 
Moreover, eye-tracking data suggests that the participants’ visual attention is 
placed on the table as soon as the dialog opens, which might be due to the ta- 
ble’s central placement and its contrasting red and blue colors. Expert evaluators 
have suggested removing these icons when data is being requested, since there 
is already a symbol indicating that data will be sent to a data controller, namely 
the arrows pointing to the circled numbers and the double arrow for forwarding 
data to a third party (i.e., 2 1), and place a simpler icon when data is not being 
requested at all (such as (ama). 

e Participants had a very hard time understanding how some attributes of their 
personal information are certified with the use of credentials. It was hard for 
them to grasp that their information could be somehow stored in a computer 
and be ready to use for completing online transactions. In some cases, they even 
believed that their information was already stored somewhere on the Internet, 
and also that the data to be sent was the certifying credential itself and not the 
attribute. 


14 HCI for Policy Display and Administration 275 


e The puzzle metaphor represented by the image on the bottom panel, which indi- 
cates a match or a mismatch of privacy settings, was also not immediately clear 
to a some test participants. Two of them thought at first that the image was a 
corporate logo or a piece of advertisement. However, when the “Mismatch X” 
image was pointed out, the majority of participants seemed to grasp the idea that 
there was something not right on the way the data controllers would use their 
information. One participant suggested that the puzzle “image should be located 
at the top” of the dialog and that it should be more visible when the dialog pops 
up. This suggestion was considered as relevant, since one of the main intentions 
of the user when seeing the dialog is to identify if her privacy preferences match 
or do not match the privacy policy of the data controllers, and the puzzle image is 
a graphical representation of this information. However, another important inten- 
tion of the user is to distinguish which pieces of information are going to be sent 
on a particular transaction. Thus, placing the matching or mismatching elements 
at the bottom of the dialog persuades the user to verify the information to be sent 
in a transaction first and helps her make a more conscious decision. 

e From the results of the PET-USES, it can be concluded that, in general, partici- 
pants regarded the dialog as a useful tool to understand who would receive their 
information, to know what type of information they were releasing, and to make 
it easy to decide the amount of information to release for each transaction. 
Interestingly enough, the participants reported that they did not feel safe enough 
releasing personal information even when the dialog conveyed it was safe (i.e., 
two test participants strongly disagreed, three disagreed, and three neither agree 
nor disagreed with the PET-USES statement ‘7 feel safer realizing my informa- 
tion when the system states it’s ok’). However, three participants strongly agreed 
and seven agreed with the statement ‘/ feel safer knowing that I will be notified by 
the system if I’m about to release more information than my chosen preference, 


14.3.4 Conclusions and Outlook 


In this section, we presented our work on user interfaces for the privacy policy 
engine and their evaluations. 

In general, it is fair to say that most participants of the usability tests that we con- 
ducted understood the purpose of the program after some minutes of reflection and 
familiarisation with it. A couple of participants expressed their positive reactions 
towards the implementation of the “Send Data?” Dialog concept, arguing that they 
would be very interested in using such a program if it were fully implemented. The 
valuable feedback from these test sessions was used to improve the look-and-feel of 
the dialog and to make it more understandable for the average user, resulting in the 
design suggestion shown in Figure 14.8. 

Feedback has also been collected from initial usability tests of this last redesign. 
Results from the latest round of testing revealed a general improvement in the us- 
ability and general user understanding of the “Send Data?” Dialog prototype. For 
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Send Data? 8 
Your data will be sent and used for the following purposes 


Admin Contact Feedback Marketing Payment 


Name - Certified By: 


(@ Driver's License [Swedish] -..  }>@ > D+@ >OD — 


Inga Vainstein 


Cardnumber - Certified By: 
@ Visa Credit Card [My private... - - - - >ee 


1234 5678 9012 3456 


Expirationdate - Certified By: 
@ Visa Credit Card [My private.. ¥ = - - - >»ee 
2012-01-01 


Email: 


209 » 0 >0> - 


> Data will be sent to: > Data will be forwarded to others 
oO eBay Inc. checkout (www.ebay.com, contact@ebay.com) Privacy Policy = Data will not be sent 
® Visa (www.visa.com, customersupport@visa.com) Privacy Policy 


Privacy policy matching 


j Your Privacy Settings do not match with @'s Privacy Policy, My current privacy settings: 
~ Found mismatches: GP Minimal Data Y 


- You do not allow your Email to be used for: Marketing Anon 


- You do not allow your Email to be forwarded to others for: Marketing 


for this transaction only = 


Cancel Send 


Fig. 14.8: The “Send Data?” Dialog — New redesign after the feedback obtained 
from the current iteration cycle. 


example, participants found it easier to understand which attributes of the certifying 
credentials were about to be sent to the data controllers. Also, by removing the red 
and blue icons from the table, participants did not focus most of their attention on the 
table and other elements of the interface were more noticeable. Participants demon- 
strated having a good understanding of how to change their privacy settings and of 
the concept of “on the fly” privacy management. Most participants also expressed a 
better acceptance of the icons and images used. More importantly, participants had 
a better understanding of what a mismatch consisted of and how the program could 
help them protect their privacy. On the downside some of these findings suggest that 
users still think that the nutrition table representing the purposes for which data will 
be used is a representation of their own privacy settings. Participants seemed to still 
think that the table shows the degree to which their privacy settings match the data 
being requested by the data controller. 

Further work will continue to investigate the additional improvement in the 
“Send Data?” Dialog’s user interface. For example, the possible use of subtle icons 
to represent the types of credentials listed in the rows, replacing the circled num- 
bers by the data controller’s corporate logos to provide the user with stronger visual 
cues, as well as other visual and interaction enhancements. We might also extend 
the compatibility of the “Send Data?” Dialog to be used with other web browsers. 
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Furthermore, integrating the privacy editor in which users can modify their privacy 
settings is also considered for future work. 


Chapter 15 
Privacy Policy Icons 


Leif-Erik Holtz, Harald Zwingelberg, and Marit Hansen 


Abstract Many individuals are not aware of who is collecting and handling their 
personal data for what purpose. Usually privacy policies are too long, too compli- 
cated to understand, and reading them is hardly appealing. To improve the aware- 
ness and comprehension of individuals on what is happening with their personal 
data, privacy icons are being proposed. The PrimeLife project has developed icon 
sets for different use cases such as e-commerce, social networks and handling of e- 
mails. It conducted user tests and an online survey to analyse how well users under- 
stand what the privacy icons should express. This section summarises the findings 
of PrimeLife’s work on privacy icons. 


15.1 Introduction 


Content that should be quickly understood by a broad audience is often expressed 
via icons, e.g., symbols pointing to fire exit or subway station. In general, well 
designed icons are able to convey information by means of one single graphical 
representation expressing the relevant content in a manner understandable for wide 
audiences, ideally even across cultural domains. This leads to the idea of developing 
icons that inform data subjects about privacy-relevant issues. In particular, parts 
of privacy policies could be expressed by icons making the content of these legal 
documents easier to access and comprehend. Other use cases for icons within the 
privacy area intend to display privacy aspects and implications of user actions in 
the user client, on websites or in social networks. Within the PrimeLife project, this 
idea has been taken further, scenarios for the deployment have been developed, and 
sets of icons were tested within usability tests. 

This section elaborates on the motivation for the approach chosen by PrimeLife, 
describes the developed icon sets and presents results from a user test and an online 
survey. While these icon sets deal with services offered by data controllers who are 
the addressees for incorporating icons in their service, a different area of use can 
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be a peer-to-peer scenario when users themselves attach to their data information 
on how they want others to handle these data. An example for this approach for 
the scenario of sending and receiving e-mails is sketched at the end of this section. 
Finally the conclusion and outlook section summarises the findings and further steps 
to be taken for research and development as well as policy makers. 


15.2 Motivation for Introducing Privacy Icons 


The potential area of application for privacy icons is broad [Han09]. Icons may 
be deployed for indicating rights and limitations for own data provided via e-mail, 
for social networks, blogs, or websites showing prominently an illustrated abstract 
of their privacy policy. Machine-readable policies, as provided by some websites, 
may also be interpreted and translated into icons on the client side. Also third-party 
services commenting others’ privacy policies, e.g., [GPSO9], may deploy icons in 
their implementations. 

However, it should be clear the use of icons cannot and does not intend to re- 
place fully written privacy policies as the basis for informed consent, according to 
European Privacy regulations. But privacy icons may be used to supplement written 
privacy policies, pointing to relevant sections, e.g., by adding them as an initial to a 
paragraph [Pri08, p. 28]. Icons may also serve as a very abstract but easy to access 
level of layered privacy policies as has been suggested by Art. 29 Working Party 
[Par04]. 

Within the PrimeLife project, a large icon set has been developed with the aim to 
test the user’s understanding in respect to more complex or less known principles. 
The research findings will aid in the development of understandable icons sets for 
use in specific environments. 


15.3 Related Work 


The idea of expressing relevant statements from privacy policies using icons had 
been developed earlier within the privacy community. To our knowledge, Mary Run- 
dle was the first to propose a draft set introducing icons for privacy statements in a 
Creative Commons-like style [Run06]. Her approach offers a brief set of icons for 
different purposes and data types and was meant to foster further discussion on this 
topic. The icons do not take the existence of a legal data protection framework for 
granted, but rather build upon the U.S. understanding of privacy [Han09]. Matthias 
Mehldau independently developed an icon set, also inspired by the Creative Com- 
mons licenses [Meh07]. Based on European understanding of data protection, his 
approach does not only depict data types and groups of recipients but also includes 
icons for certain purposes. Other proposals for introducing a graphical representa- 
tion of privacy statements or privacy properties were made by Bickerstaff [Bic09], 
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Kelley et al. [KBCRO09], Helton [Hel09], Raskin [Ras10] and in the KnowPrivacy 
report [GPS09]. In the area of social networks, Iannella and Finden have made the 
attempt to introduce a set of icons on access rights from peers to the user’s data. 
These icons are accompanied by machine-readable statements in a policy language 
[IF10]. For a detailed introduction and references to further suggestions and analysis 
see the related PrimeLife Deliverables D4.3.1 [Pri08] and D4.3.2 [PrilOd]. 


15.4 PrimeLife Icon Sets 


So far, two icon sets have been developed within PrimeLife: one for general use 
and a specific set limited to show which user group is supposed to gain access to 
one’s profile in a social network. Note that the proposed icons are not meant to be 
warning symbols for potentially risky processing, but rather aim at describing in 
a neutral way what happens with personal data. Some types of processing that are 
usually understood to be harmful may in some cases even be intended by a data 
subject, e.g., person-related profiling of interests can be of major value in a dating 
service to find the person sharing these interests. 

This section firstly introduces the main characteristics of the different icon sets 
and then presents first results of user tests and an online survey for the evaluation of 
these icons. 


15.4.1 PrimeLife Icon Set for General Usage 


The icon set for general use contains the sub-categories data types, groups of recipi- 
ents and processing steps including references to common purposes. Icons depicting 
data types include: personal data, sensitive data, medical data, payment data [Pril 0d, 
p. 39]. Icons referring to specific types of processing and purposes comprise: le- 
gal obligations, shipping, user tracking, profiling, storage, deletion, pseudonymisa- 
tion, anonymisation, disclosure to third parties and data collection from third parties 
[PrilOd, p. 39]. This icon set is suitable for being used in e-commerce scenarios. As 
a typical example, it contains a specific icon that expresses how long the user’s IP 
address is stored. Already, the existence of such an icon may create awareness that 
even the storage duration of IP addresses can be relevant for compliance with data 
protection laws both on the service’s and on the user’s side. The use of icons on their 
own does not always achieve full clarity on the user’s side about the intended data 
processing: For instance, the purpose “legal obligations” reflects the claim of the 
data controller that there is some regulation that demands data processing performed 
in the service’s workflow, but does not go into detail. This information should be 
given at least in the full privacy policy of the data controller. 
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15.4.2 PrimeLife Icon Set for Social Networks 


In social networks, further privacy-related statements are helpful for users, in par- 
ticular to visualise who, usually in addition to the social network provider itself, 
will get access to which information. Therefore, the icon set for social networks 
contains icons for the following groups of recipients: selected individuals, friends, 
friends of friends and the whole network respectively the general public. The icons 
from the social network icon set may be used for depicting the personal configu- 
ration of privacy settings or the audience selection of individual pieces of content, 
e.g., to directly choose (groups of) individuals that may or must not gain access to 
selected data. In addition, the icons may work as reminder whenever the user looks 
at her profile. They can also be used in combination with components of the icon 
set for general usage. 


15.5 Test Results 


The icon sets developed in the PrimeLife project have been tested. For this purpose, 
the test group was asked to evaluate alternative icons that should express the same 
aspects to see which of the developed icons fits best. In this section, we describe 
important results and show the icons that will be used for the final icons set(s). An 
earlier test revealed shortcomings when displaying a large variety of purposes for 
different use cases by too many icons. This led to the development of one icon set 
for general use and an additional icon set for usage within social networks. These 
sets have been tested with 17 Swedish and Chinese Students each and in an online 
survey with 70 test persons. Participants of the online survey assessed themselves 
as being privacy-aware, the students were unaware. The test set ups and first results 
are described in [Pril0d]. Final test results and interpretations will be published in 
PrimeLife’s final HCI report [Pril 1]. 

The icons for address data, medical data, payment data, for the purpose shipping 
and for the data processing procedures storage and deletion (see Figure 15.1) were 
voted to be understandable, clear and helpful by the students, both Swedish and 
Chinese. In addition, these icons were also rated better than alternatives in a web- 


based survey. 
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Fig. 15.1: Excerpt of well understood icons for general usage tested by KAU. 
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Survey participants were also asked to express suggestions and comments. The 
floppy disk symbol raised the concern that it relates to an outdated medium and 
might become unfamiliar and hard to recognise in the future. 

The online survey provided the participants with a scenario for each icon and 
asked how well an icon fit the described purpose or, in case of alternative icons, 
which of them fits best. Some of the icons had been found to be intuitive and easy to 
understand. These icons received the majority of votes as well as positive comments 
and included the representation of payment data, the icon for storage periods, and 
the icon stating that information will only be transferred to individuals selected by 
the user (see Figure 15.2). 


Fig. 15.2: Excerpt of evaluated icons. 


The test of the icons for visualising the recipients in social networks indicated 
that research is on the right track. The survey showed that the participants found 
the icons shown in Figure 15.3 to best fit in order to depict the following groups of 
recipients: selected individuals, friends, friends of friends and the whole network. 


Fig. 15.3: Excerpt of icons showing different groups of recipients within social net- 
works. 


The test results suggest that clear icons with few details are preferred. For the 
more complex concept “friends of friends”, the icon from the initial PrimeLife draft 
[Pri08] was chosen where the circles indicating friendship were seemingly clearer. 
Of course, a unification of the design style would be necessary before proposing 
these icons for a wider use. 

For PrimeLife’s further research this raises the question: to what extent can and 
should more complex concepts be visualised by icons? A possible solution may be 
to develop sets of icons for specific environments and use cases strictly limited to 
a handful of easy to grasp icons as been suggested for the use in e-mail with the 
Privicons (see Section 15.6). 
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15.6 An Approach for Handling E-mail Data: Privicons 


“Privicons” is the name of an approach developed by researchers of Stanford Uni- 
versity, the PrimeLife project and interested individuals to convey the information 
about how the data in e-mails should be handled [Pro10]. By the Privicons approach, 
the sender of an e-mail message has a means to express her preferences on how the 
message content should be handled by the receiving user(s). For this purpose, the 
semantics of six icons in a graphical as well as in pure ASCII representation (“Priv- 
icons”) are described (cf. Figure 15.4 that shows both kinds of representation for 
the six proposed Privicons). These can syntactically be integrated either in the first 
line of the body, in the subject line and/or in a dedicated header of any e-mail mes- 
sage. E-mail user agents can be implemented that support the users in handling the 
messages according to the Privicons statements given in Figure 15.4. 


“® [X]Keep Secret smh [=] Delete After Reading 
@ [°]Keep Internal ®& [-1Don't Attribute 
> [>] Please Share [/] Don’t Print 


Fig. 15.4: Privicons. 


By using the “keep secret” Privicon, the sender of the e-mail requests the recipi- 
ent(s) to keep the received message secret. Related is the usage of the “keep internal” 
Privicon by which the receiving users are asked to present the e-mail message only 
to those people that are common friends, or otherwise qualify as “internal”, e.g., by 
being part of a group of people that are in a tight relation to both the sending user 
and the respective receiving user. 

In contrast, the Privicon “please share” depicts the sender’s request or offer to 
the recipients to share the e-mail. 

Again, another confidentiality-related Privicons is “delete after reading’: The 
sender requests the recipient(s) to delete the e-mail after reading it. The “don’t at- 
tribute” Privicon addresses information about the sender, i.e., it asks the receiving 
user(s) to not attribute, name or mention the original sending user of the e-mail mes- 
sage in any kind. If not stated otherwise, the receiving user(s) may quote, follow or 
paraphrase the content, facts and opinions voiced in the original e-mail message. 
Finally, the Privicon “don’t print” is self-explaining. 

Meanwhile the Privicons project has drafted a Request for Comment to initiate 
the debate on the icons as well as how to embed them in day-to-day e-mail transfer. 
In this respect, the project is also working on proof-of-concept implementations. 
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15.7 Conclusions and Outlook 


The results of the tests performed by the PrimeLife partners KAU and CURE indi- 
cate the further proceeding in the icons development. The well-rated icons will be 
refined to a final version of one or several smaller use-case specific sets. For this, 
the test results will be analysed inter alia to answer the question of how complex 
the depicted concepts may be. In addition, the lesson learned so far suggests that we 
stick with clear and simple icons. 

The vast variety of research groups that work on some kind of privacy icons 
emphasises the need for standardisation. In parallel to standardisation efforts that, 
among others, should involve data protection authorities as well as user organisa- 
tions, the approach of machine-readable privacy statements should be brought for- 
ward. The use of icons and the incorporation of machine-readable policies, as well 
as their relationship to each other and to today’s practice of presenting privacy poli- 
cies in legalese on the service’s website has to be spelled out. For instance, it should 
be avoided that users looking at the icons get a totally different picture of the in- 
tended data processing than those reading the privacy policy in natural language or 
those who rely on the interpretation of the machine-readable policy by their user 
client. Further, thought should be given to incentives for data controllers to inform 
the data subjects in a better way than pointing them to the privacy policy and to 
educate individuals for better understanding of all aspects that are relevant to their 
privacy. 
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Part IV 
Policy Languages 


Introduction 


Machine-interpretable policy languages are at the heart of any modern privacy in- 
frastructure. Rather than “hard-coding” fixed privacy policies into the infrastructure, 
dedicated policy languages provide the flexibility to express and change policies 
without having to re-implement the software that enforces them. Moreover, if mul- 
tiple interacting parties agree on the grammar and semantics of a language, policy 
languages can also be used to communicate privacy policies across different inter- 
acting entities. Finally, security and privacy policy languages are an important tool 
to ensure compliance with legal, industrial, and user requirements. 

PrimeLife set out to collect policy language requirements from the diverse sce- 
narios covered by the project, and to analyze the suitability of existing policy lan- 
guages to cover the privacy aspects. It quickly became clear that none of the existing 
languages covered all the needs. A report of these activities is given in Chapter 16. 
However, it also became clear that satisfying all of the collected requirements was 
far beyond PrimeLife’s available time and budget. We therefore hand-picked a num- 
ber of features based on their potential to improve digital privacy in the real world 
and on their feasibility within the restrictions of the PrimeLife project. 

Chapters 17 and 18 give an overview of the technical research results on two main 
aspects in which appropriate policy language support was found lacking. Chapter 17 
focuses on the relation between access control policies, which specify which entities 
are allowed to obtain certain information, and data handling policies, which specify 
how these entities are to treat the obtained information. This relation becomes par- 
ticularly complex if information can be forwarded to third parties, so-called down- 
stream usage. The chapter describes a language that allows users to automatically 
match their preferences to the policies proposed by servers, thereby assisting them 
in their choice of whether or not to reveal their information. 

Chapter 18 focuses on privacy-friendly access control policies. It proposes “cre- 
dentials” as a generalisation of several existing authentication technologies, cov- 
ering well-established technologies such as X.509 certificates and LDAP directo- 
ries, as well as anonymous credentials. Rather than assuming that all of a user’s 
attributes are revealed by default, the language expresses which credentials users 
need to possess in order to gain access, which attributes they have to reveal, and 
which conditions they have to satisfy. Policy sanitization strikes a balance between 
users’ privacy and enterprises’ sensitivities about the policy details. 

Chapter 19 takes a closer look at the legal requirements under European law to 
transparently inform users about how their information is used. Expressing such 
usage in an understandable way is a notorious challenge. Faced with the multitude 
of applications and usage purposes and the lack of a structured ontology among 
them, this chapter investigates the current practices in data usage in various contexts 
and discovers a common structure. 

Finally, a number of these concepts, in particular the research results presented 
in Chapters 17 and 18, were brought together in the design of the PrimeLife Policy 
Language (PPL). To be of use in real-world settings, PPL is defined as extensions 
to the industrial standards XACML and SAML. Chapter 20 reports on the architec- 
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ture and implementation of a fully functional policy engine to enforce PPL, thereby 
bringing the advanced research concepts to life. 


Chapter 16 
Policy Requirements and State of the Art 


Carine Bournez and Claudio A. Ardagna 


Abstract The design and implementation of a versatile privacy policy language is 
one of the core activities in the PrimeLife project. Policy languages are a crucial tool 
in any privacy-aware information infrastructure. Machine-interpretable languages 
have a major advantage over natural languages in that, if designed properly, they al- 
low automated negotiation, reasoning, composition, and enforcement of policies. 
The requirements are the first step in the development of such a language. The 
methodology was to collect use case scenarios and derive concrete requirements 
from them. This chapter presents those requirements independently; they are not 
derived from research work other than the PrimeLife study itself. 


16.1 Definitions 


We first define the three types of policies that, in our view, are important parts of any 
privacy policy: data handling, access control, and trust policies. This by no means 
implies that we consider these to be separate, independent policies that together 
form the privacy policy. Rather, we see them as three minimal aspects that have 
to be covered by any policy language. There may be other aspects, and the three 
aspects mentioned here may not be orthogonal. 


16.1.1 Data Handling Policies 


A data handling policy (DHP) is a set of rules stating how a piece of sensitive 
data should be treated. In the context of privacy, we are mostly interested in the 
case where a piece of data is personally identifiable information (PII). The data 
handling policy specifies, among other things, for what purposes the data can be 
used (e.g., research, marketing), to which third parties the data can be disclosed 
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(e.g., all, nobody, only auditors), and the obligations on data management (e.g., how 
long the data can be stored). Obligations define actions that must be executed by the 
party in charge of enforcing a policy. Those actions are triggered by events such as 
time or handling collected data. 

We distinguish between three different types of DHPs, namely data handling 
preferences on the data subject’s side, and data handling policies and sticky policies 
on the data controller’s side. In the data handling preferences associated with a 
piece of PII, the data subject specifies its requirements on how it expects the data 
to be treated by the data controller. The data controller, before receiving the PII, 
describes his intentions on how he will treat the PII in his data handling policy. 
When an agreement is reached, the agreed-upon policy that the data controller is 
required to adhere to is referred to as the sticky policy. 


16.1.2 Access Control Policies 


An access control policy (ACP) protects access to an object by specifying which 
recipients should be granted which type of access to the object. The object being 
protected can be a piece of data such as a file, a database record, or a webpage, but 
it can also be a more abstract functionality such as a service or a remote procedure 
call. The subject can be specified by means of a unique identifier (e.g., user name), 
by role (e.g., administrator), by a group that he belongs to (e.g., helpdesk), or by 
other attributes (e.g., age, reputation, ...). A subject can be any type of entity that is 
capable of making a request; it could be a natural person but could also be a running 
process or a device, or a combination of these. The possible types of access (e.g., 
read, write) depend on the resource that is being protected. Finally, the decision to 
allow or deny access can be based on the subject’s properties, the content of the 
resource, the details of the access request (e.g., parameter values passed in a remote 
procedure call), and secondary information such as current time, processor load, etc. 


16.1.3 Trust Policies 


The concept of trust is almost inherently vague due to its close association with 
the subjective decisions made by humans in real life. Even within the technical 
community, there seems to be quite some confusion about the definition of trust 
policies. In the scope of this chapter, we clarify the concepts related to trust that 
would be required in a policy language. The list of requirements will also contribute 
to delimiting the technical definition of trust. 

In general, a trust rule is a rule expressing that a specific entity is entrusted to 
perform a specific action if a specific condition holds. When placing trust in some- 
one or in an entity, the relying party expresses a belief that that person or entity will 
behave in a way that is beneficial to the relying party’s interests, and will not behave 
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in a way that would harm its interests. Trust is contextualised, in that your trust in 
someone to perform certain actions is limited to within a given context. The action 
can be either to certify a specific piece of content (e.g., write about technology or 
issue a passport) or to adhere to an agreed-upon data handling policy. The condition 
can be any condition that has to be satisfied for the rule to apply. We leave open 
what kind of statements the condition can contain; conditions can be based on the 
credentials held by X, privacy labels, reputation, environmental conditions, etc. 

The two actions that we have in mind when talking about trust are trust in an en- 
tity to certify a specific type of content (content trust) and trust in an entity to adhere 
to an agreed-upon data handling policy (data handling trust). The main difference 
between both types is the direction of the information flow. For content trust, it is 
the trusting entity who receives information (possibly indirectly) from the trusted 
entity; for data handling trust, it is the trusted entity who receives information from 
the trusting entity. Content trust is about the correctness of received information; 
data handling trust is about how information is treated after it is transmitted. 


16.2 Legal Requirements 


Besides use cases and applications where policy languages will be used for privacy 
protection, legal requirements are imposed by institutions (governments, European 
Union). We briefly summarise the main legal requirements, that will later be derived 
into technical constraints on the policy language. 

The processing (including collection, storage, retrieval, transferral, and other 
means of handling) of any data that can be linked to a person (personal data, for 
a more thorough definition see Article 29 WP, Op. 136) by another entity (a data 
processor) has to be legitimate. If processing takes place without obeying legiti- 
macy, those subjected to the processing (data subjects) would lose trust in markets 
and tend to not give away their data when acting in these markets. Therefore, data 
protection is a market enabler, but above all it is recognised as a fundamental right, 
and is acknowledged by many constitutions in the European Union, as well as in the 
Charter of Fundamental Rights of the European Union (cf. Art. 8 thereof) and the 
European Convention on Human Rights (Art. 8 thereof). 

On an operational level, the Directive 95/46/EC on the protection of personal 
data (Data Protection Directive) [Dir95], and the Directive 2002/58/EC on Privacy 
and Electronic Communications (E-Privacy Directive) provide the baseline for com- 
pliance. In many areas, sector specific regulation and national implementation of di- 
rectives need to be taken into account. Both directives, as well as most of the other 
regulations, follow a set of well established principles, with the principle of fair 
and lawful processing, the purpose limitation principle, data minimisation, and the 
transparency principle at their core to name a few. 


Fair and lawful processing. Conceptually, European law effectively prohibits any 
processing except where there is a legal basis (Art. 6 of 95/46/EC). This means 
that in a professional context, handling data without a legal basis is illegal. Non- 
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compliance can result in penalties and may even lead to data protection authorities 
shutting down IT systems (see 838 of the German “Bundesdatenschutzgesetz”). A 
legal basis can be derived directly from legal regulation, e.g., when the law pre- 
scribes the storage of specific data for law enforcement purposes. In many cases, 
however, the lawfulness can be achieved by receiving a consent, in an unambiguous 
form, from the person whose data is concerned, i.e, from the data subject (Art. 7(a) 
of 95/46/EC). 


Purpose limitation. The processing of personal data — even if acquired lawfully 
(which should result in legitimacy) — is still a subject to further regulation. It regu- 
larly needs to follow, amongst others, the principle of purpose limitation (Art. 6.1(b) 
of 95/46/EC). Put simply, the purpose limitation principle states that data may only 
be collected, stored, processed or transferred for those purposes of which the data 
subject has given consent, or of which the law allows. No further processing that 
would be incompatible with the original purpose is allowed. 


Data minimisation. This implies that if no purpose is at hand, the data has to be 
deleted or not even collected in the first place (Art. 6.1(c) 95/46/EC). But data min- 
imisation can be understood in a broader sense to construct systems in such a way 
that processing personal data can be avoided (so-called data avoidance). While the 
former is a legal requirement, the latter is not mandatory by law in all cases, but 
only where such technology is available under reasonable conditions (cf. Recital 46 
of 95/46/EC). However, it can be sensible to develop and use such approaches even 
in an enterprise context when there is no legal necessity, as it may lower compliance 
costs. 


Transparency and subject access rights. The principles regarding the processing 
itself are adjunct to specific, enforceable rights for the data subject (e.g., Art. 12 
and 14 of 95/46/EC). The conceptual idea behind these rights is that the data subject 
should be able to find out what others know about him or her. In case this knowledge 
is illegitimate, the data subject should be able to stop the respective data controllers 
from using this knowledge, by blocking, correcting or deleting the personal data. 


Generally, the protection of data does not always prove to be easy in information 
systems. This is especially true for the protection of personal data. A core difficulty 
lies in the diversity of the processing steps that this data may undergo, while at the 
same time being subjected to the above principles. For compliance, it has to be en- 
sured that any algorithm or any service of an IT system that processes a specific set 
or piece of personal data is within the limits of the legal foundation (e.g., the con- 
sent) for the processing, and that it does not violate the purpose limitation principle. 
At the same time, it needs to be ensured that the data subject is able to find out what 
happened to his or her data, who accessed it, and what it has been or will be used 
for. 
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16.3 Policy Language Requirements 


16.3.1 General Design Principles and Expressivity 


Measurability. This property of a policy language is fulfilled when the construction 
of the language allows one to check that the policy has been followed correctly. 
A mechanism to prove that a rule has been applied is useful but not sufficient to 
demonstrate this property, since some policies can be applied a long time after the 
moment when it has been stated. 


Unified model. Because Data Handling Policies are closely related to Access Con- 
trol policies, a unified model is a key success factor for a policy language. Even 
though in this document we often focus on access control and data handling policies 
separately, they are in fact closely related. A server’s access control policy should 
not only specify what PII it wants from the user, but also how it is planning to treat 
the data. Here, the DHP is part of the ACP. On the other hand, a user’s DHP may 
specify which third parties are allowed to see the PH, so that the ACP becomes part 
of the DHP. 


Semantic compatibility with P3P. A policy language should be semantically com- 
patible with the Platform for Privacy Preferences (P3P [W3C06a]) as much as is 
possible. 


Stickable policies. Stickability is the property of the policy language that allows for 
attaching a policy to data no matter how, where and when the data is sent. 


Revocability. It must be possible for any policy user to revoke a Data Handling 
Policy the same way it is possible to revoke a credential related to an Access Control 
Policy. 


Transparency. A policy language must be able to express that the data flow trace 
resulting from the transfer of the data between entities should be kept. There should 
be mechanisms in place to log the usage of personal data. Such logs will span mul- 
tiple trust domains in case of downstream usage and parts of it may travel with the 
data (sticky logs). 


High-level (abstract) policies. The policies should be expressible not only ona low, 
i.e, more technical, level but also on a higher, i.e, more abstract, level. The benefits of 
this are, for example, that the policies can become shorter, easier to understand and 
easier to formulate. Among other techniques, ontologies could be leveraged to bring 
the policies to a high(er) level. Instead of talking about credit—card-number, 
credit-—card-name, etc., an ontology could describe such data under the class 
credit-card data, or even more general, payment data. Then the policy 
can refer to concepts like credit-card or payment data if needed. 


Data minimisation. The policy language should support — and encourage — the 
minimisation of the amount of personal information that is revealed in order to gain 
access to a resource. The architecture should definitely not assume that all informa- 
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tion about the subject is readily available when the access decision is made. Rather, 
the list of attributes that need to be revealed, or the predicate that needs to be proved, 
should be explicitly specified by the server, or perhaps even be the result of a nego- 
tiation between the client and the server. The client should then have the option to 
reveal only those attributes that are strictly necessary. Whether this is possible, of 
course, not only depends on the policy language, but also on the underlying authen- 
tication technology. 


Anonymous or pseudonymous access control. A user shall have the possibility to 
access a resource in an anonymous or pseudonymous way. For an anonymous ac- 
cess, the server makes sure that the user fulfills the necessary requirements, while the 
required attributes allow the user to stay anonymous. This is of course only possible 
if (1) the required attributes are applicable to a large number of people and the user 
can therefore not be identified, and (2) the underlying technology supports proving 
the attributes in an anonymous way (for example using the technology of anony- 
mous credentials). A pseudonymous access is similar to the anonymous one, with 
the difference that for every access a user makes, he provides some kind of identifier 
- a pseudonym - which the server uses to recognise that the same (pseudonymous) 
user requests access. However, the server only knows the pseudonym and not the 
real identity of the user. This is important if a user wants to keep some profile on 
the server side, while the user still wants to be anonymous to the server. From a 
legal point of view, services must be offered pseudonymously whenever that can be 
considered reasonable for the service. 


Meta-policies and policy generation. In some cases, it is necessary to constrain the 
possible policies that can be attached to data by rules or guidelines. These guidelines 
are locally enforced when defining preferences and policies. For instance, a data 
subject may define a rule (i.e, a policy) that forbids the creation of any preference 
allowing the use of medical data for advertisement. Such a policy could also be 
provided by a trusted third party. This can be achieved by defining meta-policies 
that specify how policies can be customised. The same mechanism can be used to 
specify constraints on policies that are generated, e.g., by a service in response to a 
user request. Those constraints can be derived from trust or access control assertions. 
This mechanism can rely on a way to express policy generation rules in the policy 
language itself. 


Data model primitives. The policy language must make consistent use of (at least) 
the concepts of date and time, and location. These concepts are essential for the 
expression of data usage constraints, e.g., some data can be displayed in some par- 
ticular locations or can be valid for a limited period of time. 


16.3.2 Requirements for Data Handling Policies 


Business logic to describe data usage. The business logic of an enterprise deter- 
mines what actually happens with the data after it is received. If this business logic 
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is described in a standardised way, for example using WS-BPEL (Business Process 
Execution Language), then it should be possible to automatically derive the DHP 
from it, or perhaps the BPEL description itself could even be part of the DHP. 


New usage should trigger consent. The policy language must support a mechanism 
to acquire new consent from the data subject if the data controller wants to change 
the policy. However, the data subject can indicate in her preferences that she will 
never agree to any changes to the policy, and that she therefore does not want to be 
bothered with requests for policy changes. Alternatively, one could have an opt-in 
mechanism, where the data subject has to explicitly state in her preferences that she 
would consider changes to the policy. 


Legal policies need differentiated layering. A policy language must include the 
possibility to express and address at least three layers of human-readable text to 
describe a policy to the user. This is recommended by the Op. 100 of the Article 
29 group’s Opinion on More Harmonised Information Provisions: a short version 
of the privacy policy, with an addressable substructure to be defined; a condensed 
version of the privacy policy, with an addressable substructure to be defined; and 
a full (lawyers readable) version of the policy, with an addressable substructure to 
be defined. A fourth layer to express the policy with iconography should also be 
available, with a set of icons to be defined. 


Technical representation of legal policies. The policy language should be able to 
express legal policy concepts (e.g., liability, data controller, data processor, etc.) and 
conditions relevant for machine-based decision making, in a form supporting their 
digital storage, transmission, and processing. The semantics of the representation 
should be carefully considered and be compatible with the capabilities of an efficient 
processing engine. 


Constrained customisation of privacy policies. In specific cases, it is necessary to 
let the data subject create a sticky policy that is slightly different from the privacy 
policy of the data controller. We assume that 1) the data controller constrains which 
modifications are legitimate and 2) the data controller checks whether the provided 
sticky policy is indeed compliant with the initial policy. 


Support nested policies. The policy language must support nested policies. Thus, a 
policy could include a number of specific policies for further processing of the data. 


Express user preferences. A policy language must allow for users to express pref- 
erences about the handling of their data. In particular, it must be possible to express 
preferences for the use of given credentials for a given purpose. The user should 
also be able to express general trust relationships independently of a given scenario 
or purpose. The language should be extensible enough to express new user-defined 
preferences. 


Describe server policies. When the server has its own Data Handling Policies (one 
or multiple), the user’s Data Handling Policy should match one of the available 
server policies. 
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Originator’s policy. A policy language must include a mechanism to identify what 
the data subject allows, no matter who transmits the data. 


Logging/Monitoring/Auditing Policies. It must be possible to inform the user 
about data collected during the usage of the service (date, location, actions, creden- 
tials used, etc.). It must be possible to express the scope of the retention (page, ses- 
sion, duration) and usage (extend user experience, debugging, legal requirements) 
of the collected data. It must be possible to express how the data is collected: how, 
when, where it is stored. 


Security levels. The level of privacy protection achieved by setting a policy does 
not only depend on the claims made by the subject, but also on the underlying tech- 
nology that is used to prove the validity of these claims. The policy designer should 
not be concerned with technical details such as cryptographic algorithms and key 
length, but given that the language should be useful in both low- and high-security 
environments, some notion of ’security levels” seems appropriate. What these se- 
curity levels imply on the underlying technology and infrastructure could then be 
specified in a separate ontology. The policy designer could use this ontology in a 
more practical way than defining the technical details himself. 


DHP ontology. The language should provide an ontology for data purposes and 
types of data. It should be extensible, as we can impossibly foresee all items that 
should appear in this ontology. 


Enforcing DHP. Technological means to enforce Data Handling Policies are lim- 
ited. A trusted software infrastructure can assist in automatically adhering to a DHP 
(e.g., deleting data on time) and in logging access for audit purposes, but eventually 
these systems can always be circumvented by a malicious user (e.g., by forwarding a 
picture of the screen displaying the sensitive information). In the end, one will have 
to either trust the receiver to adhere to the DHP that was agreed upon, or to trust an 
external auditing agency to correctly certify such receivers. It should be possible to 
express this type of trust in the policy. 


Breaking the glass. A special case of a policy: the law prescribes certain areas, 
where access and processing is compliant, although clearly not within the prior 
consent. In these cases, it might be reasonable to invoke special mechanisms for 
transparency (i.e, the glass is broken = the prior consent has been exceeded). Such a 
policy could state that certain entities are entitled to access the data, but then certain 
obligations regarding information of other parties, particularly the data subject or 
data protection supervisors, might be triggered. 


Capture user intent. This property is needed to differentiate the user’s intent when 
using a service from the purpose of the service itself. User intent does not necessar- 
ily match the service purpose. The mechanism for capturing the user intent may be 
simple, however, processing it requires use of semantics and logic. 


Purpose of data processing. The purposes of data processing or data handling by 
a service usually stay the same across the usage of the service for all transactions. 
However, it is not always clear for the user whether a given piece of data is going to 
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be reused for a purpose other than the most obvious one. For example, an address 
may be used for shipping, but also for later marketing actions. The policy language 
must allow for expressing all the different purposes of data handling by the service, 
so that the user can be informed about the less obvious purpose(s) of data processing. 


Express obligations. A policy language needs to express all the obligations of the 
processing party. Such obligations can be derived from the law, but also from con- 
sent. Full coverage by a machine readable policy will never be achievable, as the 
law often requires human interpretation (that is why we have judges). Obligations 
should be ready to cover the scope of purpose limitations, which is difficult to trans- 
late, since it is hard to describe, whether an access, storage, or transfer was made 
for one or another purpose. 


Notification/feedback channels. It should be possible for the data subject to require 
that notification messages be sent back to him to inform him about the obligation 
enforcement conditions of his exported data. This is an important feature since few 
security systems are able to provide a report to the data subject about the usage of 
his private information. In case of misbehaviour, these notification messages can be 
used as a proof for accountability. It should be possible to link different notification 
messages for the same piece of exported data together, so that they form a trace of 
the data usage and possibly a proof of privacy policy violation. 


16.3.3 Requirements for Access Control policies 


Declarative language to represent access control policies. The policy model 
should be accompanied by a language that enables the specification of access con- 
trol policies. The language should be declarative and accompanied by a clear and 
unambiguous semantics for the policy specifications. 


References to policies. The policy model should provide support for reusing a pol- 
icy. Referencing can be done either directly or by inheritance. 


Role models - family, friends, wider access control. The access control model 
should support multiple access control paradigms, including role-based access con- 
trol and attribute-based conditions. Roles could also be incorporated in attribute- 
based conditions by the consideration of proper attributes. 


Information from third party sources. The policy model/language should be able 
to leverage information certified by a given third party (e.g., government). 


Technology-independent credentials. The policy model should support expres- 
sions on attributes contained in digital credentials. Different types of credentials 
may be integrated in the policy model/language, such as anonymous credentials, 
X.509 credentials, pseudonym/password, Kerberos tickets, etc. 


304 C. Bournez, C. A. Ardagna 


Attribute-based access control to data. The policy model/language should support 
policies making explicit references to attributes of involved parties (e.g., requester 
of access, data on which access is requested, respondent/owner of data). Attribute 
values can be provided by means of credentials or can be metadata associated with 
objects. 


Expiration date, validity. There should be an option for access control policies to 
expire after an amount of time. Access control policies should support conditions 
and reasoning about time. Time can impact the validity of certain conditions in the 
policies or be used to support policies that might be valid only up to, or after, a 
specific time (e.g., embargo on data, data that become public after a given time, data 
that should be deleted after a given time). 


Time or event for the beginning of validity. Access may be granted or denied for 
a user or groups of users after an amount of time or after an event occurred (for 
instance to support history scientists etc.). The policy model may support event- 
based conditions other than those expressed as a time. Event-based conditions make 
policy restrictions/permissions valid at the occurrence of certain events. 


Priority of policies or combination rules for policies. In case of contradicting 
policies, we need a clear prioritisation; that is, a rule that determines which policy 
supersedes all others and how the others are combined with that policy and with each 
other (you may think of a hierarchy of policies as well). The policy model should 
support a mechanism for combining policies according to different composition op- 
erators. The policy model/language should be accompanied by a clear definition of 
the possible composition operators and their semantics should be provided. 


Data subject should be able to keep control over PII. A data subject should be 
able to control (modify, delete, etc.) his PII that has been collected and stored by 
a data controller. In this case, the Access Control of the data controller has to take 
into account the obligation to let the data subject access his PII. The control of 
the data subject on his PI may range from getting read access to stored PII, being 
allowed to update collected PII, accessing logs regarding the usage of collected PII, 
to modifying sticky policy referring to collected PII, etc. For instance, a data subject 
provides his home address (PII) to a data collector. Subsequently, the data subject 
may use a dedicated endpoint (e.g., WS, Web page) offered by the data collector to 
access and check logs related to the usage of his PII in order to figure out whether 
his PII was used appropriately. 


Choose strength of protection. The policy model should provide different levels 
of protection and give users the ability to tune protection according to their needs. 
For instance, requesting the application of specific cryptographic measure in com- 
munication or storage of private information. 


Ontologies for credential types, delegation. Ontologies can be a powerful tool to 
adapt the language to the particular needs of a particular context while maintaining 
interoperability. For example, a common ontology on the structure of personal iden- 
tity information can be used to guarantee compatibility of digital passports issued 
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by different countries. An ontology on countries and their governments can be used 
to determine which instances are certified to issue passports for which countries. 


16.3.4 Requirements for Trust policies 


Link to Data Handling policies. The policy language should provide the possibility 
to link a Data Handling Policy with trust policies; this would permit the explicit rep- 
resentation of trust on the correct enforcement of the Data Handling Policy. When 
the binding with the trust policy is not expressed, trust is implicitly assumed (this is 
often the case when trust is established at a more global level). The binding between 
Data Handling and Trust Policy could occur at different levels, expressing require- 
ments on the data subject or on the source of the credentials referred to in the policy. 


Trust establishment. Several factors should be taken into account in the decision 
to trust another entity or not. A first factor could be the exchange of credentials. 
Trust could also be based on statements made by others, for example reputations or 
privacy seals. 


Statement and Certification. Certification validates that a server is authentic and 
trustworthy, so that the user can feel confident that their interaction with the server 
has not been overheard and that the server is who it claims to be. The certificate is 
provided by a third party that should be trusted by the user. 


e A trust statement is the explicit expression of a perceived trust level. It is made 
by the truster and represents the subjective judgement of the trustee’s trustwor- 
thiness, according to the truster’s point of view. 

e A certificate is a digital document that describes a written statement from the 
issuer (certification authority, often considered trustworthy) about the identity 
of another party (in the form of public key and the identity of the owner) or a 
permission to use a specific service. It can be considered as a trust statement 
issued by a reputed third party. 


Context-dependent Trust Mechanisms. Trust mechanisms should be chosen ac- 
cording to the application context. For example, a trust policy should express a rule 
specifying that the data subject provides her e-mail address to an online book store, 
if its reputation is greater than 8/10, and to an online tax declaration website, if it is 
certified by the government. 


Privacy breach. Privacy breaches include (but are not limited to) loss of control 
on data, unauthorised access control, social engineering, phishing, and malicious 
proxy server. In some cases, legislation requires that when a privacy breach occurs, 
a notification has to be sent to the affected individual or organisation. Technological 
means to enforce these rules have to be put in place, as well as a formal and quan- 
titative definition of privacy breach assessment to evaluate when the breach occurs 
and what is the level of associated risk. 
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Link trust with Access Control. Conditions on policy may include trust evaluation, 
e.g., allow data writing if the user complies with ACP and has a trust level greater 
than X or if he has a certain certification. 


End user trust. Trust affects all levels of end-user interaction with the system, i.e, 
whenever a user wants to access a service on the web. Trust should be assessed for 
all layers involved in the transaction: user application, network, service provider. 
Trust is by definition related to a personal perception, so each user has to able to 
edit trust policies and trust preferences in an intuitive way. 


Building trust through a third party. Users may establish trust relationships using 
third party trust assessment. This may guarantee a maximum level of trust equal to 
the level of the certification authorities (best case scenario). Trust mechanisms have 
to support certificates as produced by certificate authorities (e.g., CAcert, Thawte, 
etc.) and the corresponding hierarchical mechanisms (web of trust). Trust reasoning 
has to allow for combining this information with other trust metrics (e.g., reputation 
based). 


Trust reasoning. A trust policy’s evaluation component should be able to reason 
about trust, including composing various trust metrics (e.g., reputation system, PKI 
...) and hierarchical structures. 


Trust ontologies. Trust credentials and trust assessment mechanisms should be rep- 
resented in an ontology. This ontology should categorise the different types and 
sub-types of credentials. For each type of credential we can attach an information 
about the trust assessment mechanism supporting it. 


Transparency, reciprocity. Transparency should be considered as one component 
of trust assessment (typically, transparency increases trust perception). If a data 
holder is able to monitor at any time the usage of his data by a server, his trust feeling 
will increase. Due to the fact that transparency techniques may include: historical 
data, previous behaviours, access to log files etc., this requirement is strongly related 
to the requirement on Logging/Monitoring/Auditing Policies. Reciprocity should be 
taken into account as characterised trust interaction, but this is not the general case, 
e.g., I trust a mail provider for storing my personal mails but it does not necessarily 
trust me for storing the same kind of information. 


Specification of liabilities. 


e Towards data subject: data protection obligations under the 95/46 Directive have 
to be fulfilled by data controllers. Data controllers are liable for data protection 
violations unless they can prove they are not responsible for the damage. It is 
necessary to differentiate between data controller and data processor. The role 
of data processor is reduced: he solely processes personal data as directed by 
the controller. The policy language should be able to express the role of each 
entity for each action to determine who is liable. Liability: compensation from 
the controller for the damage suffered. Remedy: a right to a judicial remedy for 
any breach of guaranteed rights. Sanctions: to be defined by member states in 
their national laws. 
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Towards relaying parties: for ensuring data accuracy, the following policies are 
important: validation of data at the moment of collection, procedures for report- 
ing and dealing with suspected inaccuracies, regular updates, restriction of mod- 
ification rights to authorised entities. 


16.3.5 Other Technical Requirements for PrimeLife 


The following requirements are specific to composition scenarios and use cases 
where anonymous credentials are used. They are lower level requirements that in- 
fluenced technical design and some can also be viewed as technical choices. 


Policy Composition. 


Support for composition of service policies and composition of user preferences. 
Cascading policies: When rules are defined at different levels (e.g., corporate, 
service, and action), mechanisms to select and aggregate appropriated rules 
should be provided. 

Prioritisation of rules: Priorities are only necessary to resolve conflicts between 
rules. Depending on the expressiveness of the language, priorities may be re- 
quired. 

Generalisation of policies. 

Multi-rounds policy definition. 

Policy negotiation: Negotiation only makes sense when the user and/or the ser- 
vice have a trade-off to make. 

Delegation of Rights. 

Revocation of Rights. 

Composition of Access Control Policies. 

Prior agreement and contracts. 

Privacy-aware audit mechanism. 

Support for data and PII: Legislations treat personal data (PII) differently from 
other types of data. 

Dynamic Trust: mechanisms to bootstrap, modify, and revoke trust are necessary. 
Scope: Trust is not unconditional. The scope of the trust relationship has to be 
defined. 

Proof of enforcement. 


Use of Anonymous Credentials. 


Technology-independent certification of data by trusted third parties. 

Trust in certified data. 

Predicates over attributes, extensible with ontologies. 

Expression of proved statement (by using same policy language). 

DHP of Derived PII: sensitive information that is computed based on actual PII. 
Alternative data recipient + associated access conditions. 

Notion of atomic credentials. 
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e Limited spending: The policy language should have provisions to express that 
one can only authenticate oneself with the same credential for a limited number 
of times. 

e Signing statements. 


16.4 State of the Art 


Technical improvements of Web technologies have fostered the development of on- 
line applications that use private information of users to offer enhanced services. As 
a consequence, the vast amount of personal information thus available on the Web 
has led to growing concerns about the privacy of its users that require the ability 
to communicate in a secure global networked infrastructure while at the same time 
preserving their privacy. Support for digital identities and credentials, and defini- 
tions of access control and privacy-enhanced languages, protocols, and techniques 
for their management and exchange then become fundamental requirements. 


16.4.1 Access Control Policy Languages 


Several access control models and languages presented in the literature [DFJSO7, 
SDO1] are based on logic expressions, and prescribe access decisions on the basis of 
some properties that the requesting party may have. These properties can be proven 
by presenting one or more credentials [BS02, TY05, LMW05, NLW05, YWS03]. 
Credential-based access control can be seen as a generalisation of a variety of ac- 
cess control models. In (hierarchical) role-based access control (RBAC) [FK92, 
SCFY96], the decision to grant or deny access to a user is based on the roles that 
were assigned to her. Clearly, one could encode the roles of a user in a credential, 
so that RBAC becomes a special case of credential-based access control. However, 
RBAC is not powerful enough to support the concept of credential. 

Attribute-based access control (ABAC) [BDDS01, eXt05, WWJ04] comes closer 
to the concept of credential-based access control, since it grants access based on the 
attributes of a user. The de facto ABAC standard eXtensible Access Control Markup 
Language (XACML) [eXt05] is an OASIS standard that proposes an XML-based 
language for specifying and exchanging access control policies over the Web (see 
also Sections 18.4 and 19.3.1). The language can support the most common se- 
curity policy representation mechanisms and has already found significant support 
by many players. Moreover, it includes standard extension points for the defini- 
tion of new functions, data types, and policy combination methods, which provide 
a great potential for the management of access control requirements in future en- 
vironments. Though XACML represents the most accepted, complete, and flexible 
solution in terms of access control languages, it only allows the specification of the 
issuer of the attributes, but does not see them as grouped together in atomic creden- 
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tials. Moreover, the architecture paradigm is far from privacy-friendly: the user is 
assumed to provide the policy decision point (PDP) with all her attributes, and lets 
the PDP decide on the basis of its access control policy. The policy that needs to be 
satisfied is not known to the user, leaving no opportunity for data minimisation. 

The first proposals that investigate the application of credential-based access con- 
trol regulating access to a server are done by Winslett et al. [SWW97, WCJS97]. 
Access control rules are expressed in a logic language, and rules applicable to a 
service access can be communicated by the server to the clients. A first attempt to 
provide a uniform framework for attribute-based and credential-based access con- 
trol specification and enforcement is presented by Bonatti and Samarati [BS02]. 
The authors propose a language for specifying service access and information re- 
lease rules based on credentials with certain properties. Access rules are specified 
as logical rules, with some predicates explicitly identified. Attribute certificates are 
modeled as credential expressions. In addition, this proposal also permits reasoning 
about declarations (i.e., unsigned statements) and profiles of the users that a server 
can make use of to reach an access decision. 

Besides solutions for uniform frameworks supporting credential-based access 
control policies, different automated trust negotiation proposals have been devel- 
oped [LWBW08, SWY01, YW03]. Trust is established gradually by disclosing cre- 
dentials and requests for credentials. The work in [WSJO0] describes how trust can 
be established through the exchange of credentials. The authors present a credential- 
based access control language that is used for protecting the user credentials. The 
work by Ni et al. [NLW05] takes the idea of [WSJOO] to cryptographic creden- 
tials and defines a grammar for a revised version of the policy language. Trust 
management systems (e.g., Keynote [BFIK98], PolicyMaker [BFL96], REFEREE 
[CFL*97], and DL [LGFO00]) use credentials to describe specific delegations of trust 
among keys and to bind public keys to authorisations. They therefore depart from 
the traditional separation between authentication and authorisation by granting au- 
thorisations directly to keys (bypassing identities). 

Other works (e.g., [GPSSO5]) have also investigated solutions for providing 
authentication and access control based on biometry [GLM™ 04]. In this context, 
Cimato et al. [CGP* 08] propose a privacy-aware biometric authentication technique 
that uses multiple biometric traits. 


16.4.2 Data Handling Policy Languages 


Some works have also focused on the definition of privacy policy languages [ACDS08, 
AHKS02, eXt05, W3C06a, Web06] that support preliminary solutions to the pri- 
vacy protection issue, as for instance, by providing functionalities for controlling 
secondary use of data (i.e., how personal information could be managed once col- 
lected). The Platform for Privacy Preferences (P3P) [Cra02, W3C06a] is a World 
Wide Web Consortium (W3C) project that addresses the need of a client to as- 
sess whether the privacy practices adopted by a server comply with her privacy 
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preferences before the release of personal information (see also Section 19.3.2). To 
this aim, P3P provides an XML-based language and a mechanism for ensuring that 
clients can be informed about the privacy policies of the server. The corresponding 
language that would allow clients to specify their preferences as a set of rules is 
called A P3P Preference Exchange Language (APPEL) [W3C02]. Privacy prefer- 
ences are specified and evaluated by the clients before data releases. SecPAL for 
Privacy (S4P) [BMB09] is a logic-based language to specify privacy policies and 
preferences. S4P specifies preferences as may assertions (i.e., authorisations) and 
will queries (i.e., obligation requests). S4P specifies policies as will assertions 
(i.e., commitments on obligations) and may queries (i.e., authorisation requests). 
The work in [ACDS08] presents a solution for secondary use management that inte- 
grates a credential-based access control system with data handling policy definition, 
evaluation, and enforcement. A data handling policy language provides the users 
with the possibility to specify data recipients, usage purposes, and obligations, thus 
regulating how their personal information can be subsequently used by external par- 
ties receiving it. 

Data handling is sometimes also referred to as usage control [PS04, HBPO5]. The 
Obligation Specification Language (OSL) [HPB* 07] supports a wide range of usage 
control requirements related to time, cardinality, purpose, and events. OSL is logic- 
based, so that a sequence of events can automatically be checked for compliance 
with the specified policy. The OSL extensions for policy evolution [PSS W08] target 
a use case similar to the PrimeLife use case of downstream usage. However, in 
OSL it is the data provider who unilaterally creates the policy to be adhered to, 
whereas PrimeLife aimed to develop a language in which such a policy is the result 
of matching a consumer’s policy against the provider’s preferences. 

Even if scenarios and trust models are different, there are clear similarities be- 
tween digital rights management (DRM), enterprise rights management (ERM), and 
data handling policies. Indeed, in each case, a data provider attaches constraints, in 
the form of a license or a sticky policy, to data sent to a data consumer. The domain- 
specific vocabularies for privacy policies and rights expression languages (RELs) 
may be rather different, but the same overall language structure can be used for 
both. The state of the art ERM and DRM languages are MPEG-21 REL [Wan04], 
XrML [Con02], and ODRL [ODR0O2]. 


16.4.3 Anonymous Credential Systems and Private Information 
Management 


Some effort has also been done in the context of anonymous credential systems [IDE]. 
An anonymous credential [CLO1, Cha85] can be seen as a signed list of attribute- 
value pairs issued by a trusted issuer. They have the advantage that the owner can 
reveal only a subset of the attributes, or even merely prove that they satisfy some 
conditions, without revealing any more information about the other attributes. Also, 
they provide additional privacy guarantees like unlinkability, meaning that even 
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with the help of the issuer a server cannot link multiple visits by the same user 
to each other, or link a visit to the issuing of a credential. There are two main anony- 
mous credential systems in use today, namely Identity Mixer [CLO1, CV02] and 
U-Prove [U-P07]. Both are privacy-enhancing public-key infrastructures allowing 
users to selectively disclose attributes from their credentials and prove conditions 
over their attributes without revealing their full values. The U-Prove system pro- 
vides one-time credentials, whereas Identity Mixer credentials can be used multi- 
ple times. Identity Mixer also has a number of interesting associated cryptographic 
tools, such as verifiable encryption [CD00] that permits proving properties about 
encrypted values, and limited spending [BCC05] that allows for restrictions to be 
placed on how often the same credential can be used to access a service, without 
compromising anonymity. 

Existing languages are not targeted to anonymous transactions and thus lack the 
ability for expressing semantics for obtaining accountability in anonymous and un- 
linkable transactions, which can be achieved through the capability of disclosure 
to third parties. The latter is a crucial requirement when considering a practical lan- 
guage for data minimisation scenarios, such as anonymous transactions, particularly 
when considering the current legislation trend. The first paper towards third-party 
disclosure is by Backes et al. [BCS05]. In [GD06], P3P is extended such that it 
allows for describing credentials and their properties, which are necessary for gain- 
ing service access. The language is XML-based, and credential descriptions also 
allow for verifiable encryptions as a special case of third party attribute disclosure. 
A language featuring a credential typing mechanism and advanced features such as 
spending restrictions and signing requirements was recently proposed by Ardagna 
et al. [ACK* 10]. 

Recent research on credential-based access control (e.g., [BSO2, T[Y0S5, LWBW08, 
RZN*05, YWS03]) has focused on client side issues and proposed solutions for reg- 
ulating the release of users’ private information (also in the form of anonymous cre- 
dentials) and possibly managing negotiation with the server. Chen et al. [CCKT05] 
propose a solution that associates costs with credentials and policies to minimise the 
cost of a credential release within a trust-negotiation protocol. Karger et al. [KOB08] 
describe a logic-based language for the specification of privacy preferences dic- 
tating a partial order among the client properties. Both solutions provide a treat- 
ment of preferences or scores associated with either credentials or properties. Yao et 
al. [YFAT08] propose a point-based trust management model, where the client la- 
bels each credential in its portfolio with a quantitative privacy score, while the server 
defines a credit for each credential released by the client and a minimum threshold of 
credits to access a resource. The proposed solution finds an optimal set of client cre- 
dentials, such that the total privacy score of disclosed credentials is minimal and the 
server access threshold is satisfied. Finally, Ardagna et al. [ADF*10a, ADFT 10b] 
define solutions that permit the client to define its privacy preferences in terms of 
sensitivity labels of portfolio components and to minimise the disclosure of sensi- 
tive information, independently from the server preferences. These proposals pro- 
vide a complete modeling of the client portfolio, consider emerging technologies 
such as anonymous credentials, and capture sensitive associations and disclosure 
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constraints of client properties and credentials. The research community has also 
focused on protecting the privacy of users once their data are released to and stored 
by servers. Several approaches have been presented including techniques based on 
k-anonymity and k-anonymous data mining. A summary of existing data protection 
techniques is presented in [CDFS07a, CDFS07b, CDFS08]. 


To conclude, existing solutions usually provide access control languages and so- 
lutions that are logic-based, powerful, highly expressive, and permit the specifica- 
tion of complex conditions involving credentials and relations between parties in a 
simple yet effective way. However, in real world scenarios, such as the one consid- 
ered in PrimeLife, fundamental requirements for access control solutions are sim- 
plicity and ease of use, rather than the presence of a complete and highly expressive 
access control language. Also, although the benefits of all these works (e.g., creden- 
tial integration), few of them provide functionalities for protecting the privacy of 
users and regulate the use of their personal information in secondary applications. 


Chapter 17 


Matching Privacy Policies and Preferences: 
Access Control, Obligations, Authorisations, and 
Downstream Usage 


Laurent Bussard, Gregory Neven, and Franz-Stefan Preiss 


Abstract This chapter describes how users’ privacy preferences and services’ pri- 
vacy policies are matched in order to decide whether personal data can be shared 
with services. Matching has to take into account data handling, i.e. does services 
handle collected data in a suitable way according to user expectations, and access 
control, i.e. do the service that will be granted access to the data comply with user 
expectations. Whereas access control describes the conditions that have to be ful- 
filled before data is released, data handling describes how the data has to be treated 
after it is released. 

Data handling is specified as obligations that must be fulfilled by the service and 
authorisations that may be used by the service. An important aspect of authorisation, 
especially in light of the current trend towards composed web services (so-called 
mash-ups), is downstream usage, i.e., with whom and under which data handling 
restrictions data can be shared. 


17.1 Privacy Specifications: Preferences, Policies, and Sticky 
Policies 


The scenario we consider is one where two parties, typically a user and a server, 
engage in an interaction where one of the parties, typically the server, requests some 
personal data, e.g., personally identifiable information (PII), from the other party 
(See Figure 17.1). We will from now on call the party that provides the data the 
data subject and the party that requests the data the data controller. Moreover, we 
consider a scenario where, at a later point in time, the data controller may want to 
forward personal data to a third party, called the downstream data controller. 

Both the data subject and the data controller have their own policies expressing 
the required and proposed treatment of personal data, respectively. These policies 
contain access control and data handling requirements. Personal data are only sent 
to a data controller after (1) the access control requirements have been met, and (2) 
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Data Subject Data Controller 
request PIl, policy 


Pll, sticky policy 


Preferences Policy 


Sticky 
*| Policy 


Fig. 17.1: Matching data subject’s privacy preferences and data controller’s privacy 
policy. 


a suitable data handling policy has been agreed upon. We distinguish three kinds of 
policies: 


Preferences: In his preferences, the data subject describes, for specific pieces of 
personal data, which access control requirements a data controller has to satisfy 
in order to obtain this personal data, as well as the data handling requirements 
according to which personal data has to be treated after transmission. These re- 
quirements may include downstream usage requirements, meaning the require- 
ments that a downstream data controller has to fulfill in order to obtain personal 
data from the (primary) data controller. 

Policy: The policy is the data controller’s counterpart of the data subject’s prefer- 
ences. In a policy the data controller defines, for specific pieces of personal data 
to be obtained, his certified properties (roles, certificates, etc.) that can be used 
to fulfill access control requirements, and a data handling policy describing how 
he intends to use personal data. 

Sticky policy: The sticky policy describes the mutual agreement concerning the 
usage of a transmitted piece of personal data. This agreement is the result of a 
matching process between a data subject’s preferences and a data controller’s 
policy. Technically a sticky policy is quite similar to preferences as described 
above, but it describes a mutual agreement between the data subject and the data 
controller that cannot be changed. After receiving personal data, the data con- 
troller is responsible for storing and enforcing the sticky policy. 


Figure 17.2 provides the overall structure of the language. A similar structure is 
used to specify policies, preferences, and sticky policies. 

Applicability specifies which personal data are targeted by the policy. Applica- 
bility in preferences specifies the type of data (or a specific data) targeted by the 
preference. Applicability in policies defines which parameter (type of data collected 
through a given interface) is targeted by the policy. Applicability is not part of sticky 
policies. 

ACUC groups access control and data handling. AccessControl defines claims 
required to gain access to the personal data. AccessControl in preferences defines 
properties of services that can gain access under this preference. AccessControl in 
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<Policies> 
<Policy> 
<Applicability> ... </Applicability> 
<ACUC> 
<AccessControl> ... </AccessControl> 
<UsageControl> 
<Rights> ... </Rights> 
<Obligations> ... </Obligations> 
</UsageControl> 
</ACUC> 
</Policy> 


</Policies> 


Fig. 17.2: Structure of policies. 


policies defines properties of the service exposing the policy. AccessControl is not 
defined in sticky policies. 

UsageControl' specifies how data is handled. It defines Rights, i.e., what the 
service is authorised to do with the data, and Obligations, i.e., obligations that must 
be fulfilled by the service. Preference-side Rights are the rights the user is willing to 
grant in a specific context. Policy-side Rights are the rights required by the service 
for a specific collected data. Preference-side Obligations are the obligations required 
by the user in a specific context. Policy-side Obligations are the obligations the 
service is willing to fulfill for a specific collected data. 


17.2 Matching Data Handling 


Given a data subject’s preferences and a controller’s policies, matching aims at au- 
tomating the process of deciding whether the data subject can safely transmit a piece 
of personal data. We introduce a ’more or equally permissive than’ operator (&) to 
match preferences with policies. Intuitively, more permissive means more rights 
and/or less obligations. We say there is a match when the preferences are more or 
equally permissive than the policy. 


17.2.1 Boolean Match 


Matching preferences and policies boils down to matching individual rights and 
obligations. Data controllers generally expose an interface (e.g. HTML Forms or 
WSDL) specifying the type of requested personal data. Applicability is used to as- 


' Tn this chapter we consider usage control (UC) and data handling (DH) as synonyms. 


316 L. Bussard, G. Neven, F.-S. Preiss 


sociate each parameter p of this interface with one privacy policy Pol,. On the data 
subject side, for each possible personal data pii, applicability determines the set of 
relevant privacy preferences Prefs,;;. When Prefs ,;; = Polp, the assignment p <— pii 
does match. More precisely, a set of preferences is matched with a policy as follows: 


Prefs > Pol <> APref € Prefs » (Pref ACUC & Pol.ACUC) (17.1) 


When multiple assignments are possible, e.g. data controller requires an e-mail 
address p and data subject has a corporate address piig and a private address pii,, 
multiple matches Prefs,;,, = Pol, and Prefs,,;;, = Pol, are simultaneously evaluated 
before picking an assignment, i.e. during identity/data selection. 

In the following, we use the notations *p;er and *p,; to denote elements within 
preferences and policies respectively. Pairs of access control and data handling poli- 


cies are matched as follows: 


ACUC Prop & ACUC pot > (ACUC prep AC & ACUC po1-AC) A 
(ACUC Pre -UC > ACUC po) -UC) (17.2) 


Note that (17.2) is evaluated multiple times during the evaluation of (17.1). For 
example, ACUCp;e is instantiated subsequently with Pref;.ACUC for all Pref; in 
Prefs. Matching Access Control policies is out of the scope of this chapter. When 
the data subject specifies rules (e.g. using XACML) and the data controller has 
attributes (e.g. X.509 or SAML), matching is implemented as an access control 
decision. Data handling requirements are matched as follows: 


UC Pret B UCpoI o 
(VR € UCpy.Rights - IR! € UCpyer Rights -R’ > R) A 
(VO € UCpye¢-0- 40" € UCpy.0-O& O') (17.3) 


Matching individual obligations (O > O’) is specified in Section 17.3. Matching 
individual authorisations (R’ > R) is defined in Sections 17.4 and 17.5. 


17.2.2 Going Further than Boolean Match 


Formulas presented in this chapter give an idea of the logic used to compare policies. 
However, the implementation is more complex because a Boolean result is generally 
not sufficient. 

In case of a match, a sticky policy expressing the agreement between the data 
subject and the data controller has to be issued. When personal data pii is assigned 


17 Matching Privacy Policies and Preferences 317 


to parameter p, the resulting sticky policy SP». pi; must fulfill the following condi- 
tions: Prefs pii © SP pii & Poly. 

In case of a mismatch, more details are also required. First, the cause of the mis- 
match has to be identified. Second, remediation can be proposed in order to modify 
the preferences and get a match. In order to identify the source of the mismatch, and 
to propose a remediation, we use a mechanism to measure the similarity of pieces 
of policy. Similarity makes it possible for the user to choose between canceling the 
transaction and changing her preferences in a privacy-friendly way. When assigning 
personal data pii to parameter p leads to a mismatch (Prefs,,;; Poly), new prefer- 
ences Prefs yii may be proposed in order to have Prefs.,;, = Pol, while minimising 


/ * 
pii — 
the differences between Prefs i and Prefs re 


17.3 Obligations 


It is not possible to develop an exhaustive list of obligations because too many 
domain-specific obligations can be envisioned, e.g. the obligation to notify the data 
subject’s doctor in a health scenario. For this reason, we developed a set of usual 
obligations (e.g. data retention) and extension points for specifying upcoming or 
specific obligations. We define an obligation with one action and n triggers as: 


Do action when { trigger, V trigger, V ... V trigger, } 


where action defines the action to execute to fulfill the obligation and trigger spec- 
ifies the event and conditions requiring the execution of this action. When an obli- 
gation specifies multiple triggers, each event corresponding to one or more triggers 
will result in the execution of the action. Obligations cannot have multiple actions 
because rollback would be too complex. For instance, six months data retention is 
expressed as the obligation to “delete data within 6 months”: 


Do DeletePersonalData() when { AtTime(t,maxDelay) } 


where ¢ is the current date and maxDelay is six months. To compare obligations, we 
define: 


OP ref a Opot = 
(VT € Opye triggers: 3T' € Opy.triggers-T >T') A 
(Opyer action & Opp.action) (17.4) 


Sections 17.3.1 and 17.3.2 give more details on triggers and actions respectively. 
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17.3.1 Triggers 


We defined seven triggers, namely At Time, Periodic, Personal Data Accessed For 
Purpose, Personal Data Deleted, Personal Data Sent, Data Lost, and On Violation. 

Trigger “TriggerAtTime” is defined in Figure 17.3. This trigger has two parame- 
ters: “Start,” i.e. when the trigger may be started, and “MaxDelay,” i.e. the response 
time. In other words, this trigger is correctly enforced by triggering the associated 
action once between “Start” and “Start + MaxDelay”. For instance, “delete within 
one year’ and “delete next month’ are translated into TriggerAtTime(now, lyear) 
and TriggerAtTime(x, y — x) respectively where x is the first day of next month and 
y is the last day of next month. 


<xs:complexType name="TriggerAtTime”> 
<xs:complexContent> 
<xs:extension base="ob: Trigger”> 
<xs:sequence> 
<xs:element name=”"Start” type="ob:DateAndTime” ... /> 
<xs:element name=”MaxDelay” type="ob: Duration” ... /> 
</xs :sequence> 
</xs:extension> 
</xs:complexContent> 
</xs:complexType> 


Fig. 17.3: Example of XML schema for trigger “TriggerAtTime.” 


When evaluating triggers as specified in Formula 17.3, triggers of type “TriggerAtTime” 
are matched as follows. 


Tpret © Tro <> (Type(Tpref) = Type(Tpo1) = TriggerAtTime) (17.5) 
(Tprep Start < Tpo.Start) 
((Tpreg Start + Tpyep MaxDelay) > (Tpo.Start + Tpoi.MaxDelay) ) 


In other words, one trigger is more permissive than another when it specifies less 
constraints. In case of “TriggerAtTime,” Tp;er is more permissive than Tpy; when it 
starts earlier and/or ends later. The obligation of deleting within one year is thus 
satisfied by the obligation of deleting within six months. 

Trigger “TriggerAccessedForPurpose” leads to the execution of the associated 
action (within a maximum delay) after each use of personal data for one of the 
specified purposes. Such triggers are matched as follows. 
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Tpret © Tro <> (Type(Tpref) = Type(Tpo1) = TriggerAccessedForPurpose) /\ 
(Tprep-Purposes C Tpo.Purposes) /\ 
(Tpre¢ MaxDelay > Tpoi.MaxDelay) (17.6) 


In other words, one trigger “TriggerAccessedForPurpose” is more permissive 
than another when it reacts to fewer events (i.e. a subset of purposes) or slower (i.e. 
longer response time). The obligation of notifying each use for purpose “treatment” 
within one day is more permissive than the obligation of notifying any use within 
one hour. 

Defining the syntax and semantics of all triggers is out of the scope of this chap- 
ter. Look at [Pri09b] for more details. Here is a short description of predefined trig- 
gers. 


e TriggerAtTime(start, maxDelay): executes the associated action once at some 
time between start and start + maxDelay. 

e TriggerPeriodic(start, end, maxDelay, period): executes the associated action 
once per period. 

e TriggerPersonalDataAccessedForPurpose(purpose, maxDelay): executes the as- 
sociated action each time the personal data is used for specified purposes. 

e TriggerPersonalDataDeleted(maxDelay): executes the associated action when 
the personal data is deleted. 

e TriggerPersonalDataSent(thirdParty, maxDelay): executes the associated action 
when the personal data is shared with a third party. 

e TriggerDataLost(maxDelay): executes the associated action in case of a major 
issue leading to data theft. 

e TriggerOnViolation(obligation, maxDelay): executes the associated action in 
case of a violation of the referenced obligation. 


17.3.2 Actions 


We defined four actions, namely Secure Log, Delete Personal Data, Anonymise Per- 
sonal Data, and Notify Data Subject. 
Action “ActionSecureLog” is defined in Figure 17.4. This action has five param- 
eters: “Integrity Level,” i.e. protection against modification of logs, “Confidential- 
ity Level,” i.e. protection against unauthorised accesses of logs, “Non-Repudiation 
Level,” i.e. protection against repudiation of logs, “Time-Stamping Level,” i.e. guar- 
antees on when a log was added, and “Availability Level,” i.e. protection against lost 
of logs. 
When evaluating actions as specified in Formula 17.3, actions of type “ActionSecureLog” 
are matched as follows: 
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<xs:complexType name=”ActionSecureLog”> 
<xs:complexContent> 
<xs:extension base="ob: Action”> 
<xs:sequence> 


<xs: 
<xs: 
<xs: 
:element 
<xs: 


<xXs 


element 
element 
element 


element 


name="IntegrLevel” type=”"xs: decimal” 
name=”"ConfidLevel” type="xs: decimal” 
name="NonRepLevel” type=”"xs: decimal” 
name=”"TimeStampLevel” type=”xs: decimal” 
name=" AvailabLevel” type="xs: decimal” 


</xs : sequence> 
</xs:extension> 
</xs:complexContent> 
</xs:complexType> 


Fig. 17.4: Example of XML schema for action “ActionSecureLog’ 


APref Db Apol oe 


2 


(Ty pe(Apre¢ ) = Type(Apoi) = ActionSecureLog) /\ 
(Apreg IntegrLevel < Apoi.IntegrLevel) /\ 

(A Prep -ConfLevel < A po.ConfLevel) /\ 

(A Prep -NonRepLevel < Apoj.NonRepLevel) /\ 

( 
( 


Apret AvailabLevel < Apo AvailabLevel ) 


Apref -TimeStampLevel < Apo.-TimeStampLevel) /\ 


(17.7) 


In other words, one action is more permissive than another when it specifies 
fewer constraints. In case of “ActionSecureLog,” Ap;ef is more permissive than A po; 
when it requires less security properties in terms of integrity, confidentiality, non- 
repudiation of origin, time-stamping, and/or availability. 
Defining the syntax and semantics of all actions is out of the scope of this chapter. 
Look at [Pri09b] for more details. Here is a short description of predefined actions. 


ActionSecureLog(integrityLevel, confidentialityLevel, nonRepudiationLevel, timeS- 
tampingLevela, availabilityLevel): the action of logging specific events related to 

personal data. 
ActionDeletePersonalData: the action of deleting personal data. 
ActionAnonymizePersonalData: the action of removing identifiers from personal 


data 


ActionNotifyDataSubject(media, address): the action of notifying data subject 
about specific events related to personal data. 


17.3.3 Enforcement 


The “Obligation Enforcement Engine” is in charge of enforcing obligations. When 
personal data are collected, related triggers are analysed in order to register for rel- 
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evant events (e.g. access to personal data stored in a legacy database) and schedule 
future actions (e.g. schedule a deletion in eleven months to enforce “delete within 
one year’). The enforcement engine reacts to relevant events and executes associ- 
ated actions (e.g. log, delete). 


17.4 Authorisations 


When occurring in a Data Controller’s policy, authorisations express the minimal 
(i.e., the least permissive set of) rights that a Data Controller wants to obtain on re- 
quested personal data. When occurring in a Data Subject’s preferences, they express 
the maximal (i.e., the most permissive set of) rights that she is willing to grant to a 
Data Controller with respect to her personal data. When occurring in a sticky policy, 
authorisations express the rights that the Data Subject has explicitly agreed to grant 
to the Data Controller. 

The main difference with obligations is that not performing an authorised action 
is not a violation of the policy. Performing an action that is not explicitly authorised, 
on the other hand, is a violation. 

We model two types of authorisations, namely the authorisation to use personal 
data for a specified purpose, and the authorisation to forward personal data to third 
patties (i.e., downstream usage). We focus on the former type here, and discuss the 
latter type in more detail in Section 17.5. 

Authorisation UseForPurpose takes a single parameter p, which is a string in- 
dicating the purpose for which personal data is to be used. Two UseForPurpose 
authorisations match whenever their purposes are equal: 


UseForPurpose(p) © UseForPurpose(p') = p=p'. 


As an extension, one could see usage purposes as occurring in a hierarchy, rather 
than as a flat list, so that for example “telemarketing” can be a subpurpose of “mar- 
keting.” The matching definition then needs to be adapted so that a match occurs 
whenever p is an ancestor of or equal to p’. 

The authorisation to use personal data only for a specified set of purposes can be 
enforced in the Data Controller’s infrastructure by annotating each access request 
to personal data with the intended usage purpose, and by setting the access control 
policy of personal data so that only requests for purposes included in the sticky 
policy will be permitted. 


17.5 Downstream Data Handling 


A second authorisation that we model in our vocabulary is the authorisation to for- 
ward personal data to other Data Controllers, or as we call it, downstream usage of 
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personal data. In her preferences, the Data Subject specifies to which downstream 
Data Controllers her personal data can be forwarded, and which data handling pol- 
icy these downstream controllers have to adhere to. For the Data Controller, we 
specify two different mechanisms to express his intentions of forwarding personal 
data downstream. Which mechanism is chosen also affects the matching algorithm, 
as we will see below. 

Our language supports nested downstream usage policies, allowing Data Sub- 
jects, as well as Data Controllers, to specify in full detail the paths that personal 
data is allowed, or intended, to follow. It also allows recursion, enabling Data Sub- 
jects and Data Controllers to express restrictions under which personal data can be 
forwarded indefinitely. Optionally, a maximum forwarding depth can be specified. 


17.5.1 Structure of Downstream Authorisations 


The authorisation to forward the data is expressed by a UseDownstream element, 
which contains the ACUC restrictions under which it can be forwarded as a child 
ACUC element. This child ACUC element can in turn contain one or more Use- 
Downstream elements, thereby enabling the specification of nested policies. If the 
ACUC element does not contain any nested downstream usage authorisations, then 
an optional parameter maxDepth can be used to indicate that personal data can be 
forwarded recursively under the restrictions of ACUC up to the indicated depth, 
which could be any integer or “unbounded”. The XML schema definition is given 
below: 
<xs:element name=”UseDownstream”> 
<xs:complexType> 
<xs:sequence> 
<xs:element ref=”ACUC” minOccurs=”0” maxOccurs="1”/> 
</xs:sequence> 
<xs:attribute name="maxDepth” type="int_or_unbounded”/> 


</xs:complexType> 
</xs:element> 


Let UseDownstream(ACUC) denote the authorisation to forward personal data 
under the restrictions of ACUC, and let UseDownstream(ACUC, depth) denote an 
authorisation to forward recursively up to a recursion depth depth. For a given graph 
of ACUCs, let |ACUC] be the “local” ACUC, meaning containing only those restric- 
tions and obligations that do not affect downstream usage. 

Using this notation, we can represent the structure of an ACUC policy with down- 
stream usage as a directed graph where each node represents a hop in the down- 
stream usage. Each node is labeled with the local ACUC policy describing how 
the data are to be treated locally. Each edge represents the permission (in case of a 
provider’s preferences) or intention (in case of a consumer’s policy) to forward the 
data under the restrictions specified by the ACUC policy at the endpoint of the edge. 
For instance, the case where ACUC, permits the right to share downstream under 
ACUCzg, but prohibits any further forwarding is depicted in Figure 17.5(a). 
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JACUC,| —1-> | ACUC| |ACUCE| De 
(a) Nested ACUC (b) Recursive ACUC 


ae JACUCa| —1-> |ACUCs| D1 


ie | ACUC:| = 


rr IACUCs| —1+ |ACUCs| 5 
1 


(c) Deeper ACUC (d) Incorrect ACUC (loop) 


Fig. 17.5: Examples nested and recursive downstream usage. 


By the restrictions that we imposed on recursion, the structure of the graph is 
similar to that of a tree where the leaf nodes can optionally have a loop, labeled 
with the maximal recursion depth. Figure 17.5(b), for example, represents a simple 
recursive ACUC. Figure 17.5(c) shows a more complicated nested structure, where 
ACUCr specifies that the data can either be forwarded indefinitely ACUC.g, or once 
under ACUC, and twice under ACUCg. Figure 17.5(d) is not valid, however, be- 
cause it contains a cycle. For the sake of readability and to avoid complicating the 
matching procedure, we explicitly forbid cycles other than simple recursions. 


17.5.2 Proactive Matching of Downstream Data Handling 


Downstream 
Data Subject Data Controller Data Controller 
request PIl, policy 


Pll, sticky policy’ 


Pll, sticky policy 


Preferences 


>| Sticky 5 Sticky 
Policy Policy 


Fig. 17.6: Proactive matching of data subject’s privacy preferences and privacy poli- 
cies of data controllers. 


For the first matching mechanism, which we call proactive matching, the Data 
Controller’s policy specifies in full detail to which downstream Data Controllers he 
intends to forward personal data, and how they will treat it. Optionally, the Data 
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Controller’s policy can point to remote downstream policies hosted by the down- 
stream Data Controllers themselves. Either way, proactive matching requires that 
the full chain of downstream Data Controllers and their policies are known at the 
moment that the Data Subject releases her personal data to the first Data Controller, 
so that the entire policy chain can be taken into account by the matching algorithm 
to reach a decision, as depicted in Figure 17.6. 

The advantage of proactive matching is that the policies of all downstream Data 
Controllers are known to the Data Subject at the moment she releases her personal 
data to the first Data Controller. This allows her to make a better informed deci- 
sion on whether or not to reveal her personal data, and, in case of a mismatch be- 
tween her preferences and a downstream controller’s policy, gives her the option to 
consciously overrule her own preferences. A possible disadvantage is that the Data 
Controller’s full workflow has to be known and fixed at the moment that personal 
data is released. Not only could the workflow leak sensitive information about the 
Data Controller’s business processes, but it is also not clear what happens when a 
downstream controller’s policy changes between the moment that personal data is 
first revealed and the moment that it is forwarded. Such scenarios require another 
matching algorithm: “lazy matching” (See Section 17.5.3). 

Matching non-recursive (but possibly nested) downstream usage authorisations 
is done by checking whether the specified ACUC restrictions match: 


UseDownstream(ACUC) & UseDownstream(ACUC’) < ACUC®ACUC' . 


Two recursive downstream usage authorisations are matched by additionally check- 
ing the recursion depths: 


UseDownstream(ACUC, depth) & UseDownstream(ACUC', depth’) 
< ACUCEACUC' Adepth > depth’ . 


Recursive and non-recursive downstream authorisations are essentially matched by 
“folding out” both recursion trees and simultaneously iterating over the nodes in 
the two tree representations to verify that it is possible to cover each branch of the 
policy-side tree with a more or equally permissive branch on the preference side. 
For instance, if in Figures 17.5a and 17.5b |ACUCg|& |ACUC,| and |ACUCg| > 
|ACUCg|, then ACUCg > ACUC,. However, it is impossible that ACUC, > ACUCg 
because ACUC; allows deeper recursive downstream usage than ACUC4. 


17.5.3 Lazy Matching of Downstream Data Handling 


The proactive matching mechanism described above assumes that all policies of 
downstream Data Controllers are known at the time that the Data Subject releases 
her personal data to the first Data Controller. There may be situations, however, 
where it is not possible to collect all the necessary policies at this time, either be- 
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Downstream 
Data Subject Data Controller Data Controller 
t Pll, poli : ; 
bits hc = = Pll, sticky policy’ 
Pll, sticky policy Fa 
> 
Preferences Policy Policy 


>| Sticky 
Policy 


Fig. 17.7: Lazy matching of data subject’s privacy preferences and privacy policies 
of data controllers. 


cause the workflow reveals business secrets of the Data Controller, because the 
workflow is too complex to process efficiently, or because the downstream con- 
trollers are not known yet at the time of matching. 

For this reason, we introduce a second mechanism called lazy matching, which 
only takes into account the properties and policies of the Data Controller, but not 
those of any downstream controllers. Here, the Data Controller’s policy merely 
specifies whether he intends to forward personal data downstream. By declaring 
his intention to do so, he implicitly declares to be willing to impose any access and 
usage restrictions on the downstream Data Controllers that the Data Subject may 
specify. At the moment personal data is further forwarded, the downstream Data 
Controller’s policy is matched against the sticky policy, which contains the Data 
Subject’s preferences with regards to whom and under which conditions the data 
can be forwarded. This procedure is illustrated in Figure 17.7. 

Both matching mechanisms ensure that eventually the Data Subject’s preferences 
will be adhered to. Proactive matching is more privacy-friendly in the sense that it 
only gives away those authorisations that the Data Controllers explicitly applied 
for in their policies. Lazy matching has the advantage that if the downstream con- 
trollers’ policies change between the moment of revealing and the moment of for- 
warding personal data, and the new policies are still within the Data Subject’s pref- 
erences, the transaction can still go through, while it would have failed if proactive 
matching were used. 

Since lazy matching gives the Data Subject slightly less control over her personal 
data, we introduce an additional boolean attribute allowLazy in an ACUC element 
by means of which the Data Subject can indicate in her preferences whether lazy 
matching is allowed for this ACUC policy. On the Data Controller’s side, the at- 
tribute indicates whether he is willing to use lazy matching, and therefore to enforce 
any ACUC policy dictated by the Data Subject. Two such authorisations are matched 
according to the rule 


UseDownstream(ACUC, lazy) & UseDownstream(ACUC', lazy’) 
& (lazy lazy’) V (ACUC& ACUC’) . 
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In the case that lazy \ lazy’, the resulting sticky policy will specify ACUC as the 
downstream usage policy, thereby ignoring ACUC’ if it was present at all. 


17.6 Conclusion 


In this chapter, we presented a simple yet highly expressive language to specify 
privacy policies and preferences. It gives a clear view on the somewhat complex 
relation between access control and data handling policies, especially in the case 
where downstream usage is taken into consideration. We presented two strategies 
to match a Data Subject’s preferences against Data Controllers’ policies: proactive 
matching, which takes the full chain of downstream Controllers and their policies 
into account at the moment that personal data is revealed, and lazy matching, where 
the downstream policies are only matched when personal data is forwarded. 

The policy engine implemented in PrimeLife (see Chapter 20) opted for the lazy 
matching algorithm. The main reason was that instead of the simple access control 
language used in this chapter, the PrimeLife Policy Language is embedded in the 
much more expressive industry standard XACML. In order to implement proactive 
matching, one would have to implement an algorithm that can test whether one 
XACML policy is more permissive than (i.e., is implied by) another XACML policy. 
Given the lack of formal foundations of XACML, this would be a considerable 
research effort in itself, which, because of the only marginal link to privacy, was 
deemed out of scope for PrimeLife. 
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Abstract This chapter presents the results of the research on how the current stan- 
dards for access control policies can be extended. In particular, Section 18.1 illus- 
trates how privacy issues can be effectively tackled by means of a credential-based 
access control that includes anonymous credentials. Section 18.2 shows how the ex- 
pressivity of policy languages can be exploited to introduce ontologies that model 
credential taxonomies and the relations among them, with a particular stress on the 
support for delegation mechanisms. Section 18.3 investigates the privacy issues that 
arise in those access control systems that are enriched with a dialog framework that 
enables servers to publish their policies. Finally, Section 18.4 maps these proposals 
onto a set of possible extensions of the architecture of the current de facto standard 
in access control policy languages: XACML. 


18.1 Privacy-Preserving Access Control 


Users commonly reveal much more personal data than necessary to be granted ac- 
cess to online resources, even though existing technologies offer functionalities that 
would allow for the authorisation to take place in a privacy-preserving way. The ba- 
sic idea to achieve privacy-preserving access control is to utilise the cryptographic 
features of anonymous credential systems [CLO1]. The concept of a credential as it 
will be used in the following is simply a bundle of attribute-value pairs that is signed 
by its issuer. 

In our model of privacy-preserving access control systems [CMN* 10], the deci- 
sion as to whether access is granted to a requester is then dependent on the posses- 
sion of, possibly multiple, credentials that fulfill certain requirements specified in a 
service provider’s access control policy. For a user to obtain access to a protected 
resource, she produces a verifiable claim that contains cryptographic evidence that 
the policy is fulfilled and sends it to the service provider who verifies it with respect 
to his policy. 
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Basing the access control decision on the possession of credentials is not new. 
However, the use of anonymous credentials provides support to the following new 
features: (a) predicate proofs over attributes from — possibly multiple — credentials, 
(b) selective disclosure of individual attributes — possibly to third parties, and (c) the 
possibility to use predicate proofs and selective disclosure without interaction with 
the credential issuer. Only in access control systems that offer at least those features 
can the authorisation take place in a truly privacy-preserving way, since a user can 
thus control which data and predicates are released and which are not. 

We believe that there are various reasons for not adopting anonymous credentials 
for authorisation systems. One reason is the fact that industry is currently more 
interested in the possibility of profiling their users than in protecting their users’ 
privacy. However, taking into account that privacy violations are reported by the 
media on an almost daily basis, it seems that public awareness on that topic is rising 
slowly but steadily and that the industry will need to adapt to these changes in the 
near future. Another reason for this lack of technology adoption is the absence of 
a suitable authorisation language offering adequate expressiveness to address the 
privacy-friendly functionalities of anonymous credentials. 

To overcome the latter problem, we have developed an authorisation language 
that allows for expressing access control requirements in a privacy-preserving way. 
Although our language is targeted towards anonymous credentials, it allows for the 
specification of authorisation requirements regardless of the underlying technology 
and its implementation details, and it is also applicable for credential technologies 
designed without privacy considerations. 

Before we describe our language and its features, we give a more detailed 
overview of credentials and their functionalities. Afterwards, we describe by means 
of a comprehensive example policy, how the credential functionalities are mapped 
into our language. 


18.1.1 Credentials Enabling Privacy-Preservation 


A credential is a bundle of attribute-value pairs that is provided by an issuer to an 
individual that becomes the credential’s owner. Credentials are always of a certain 
type that specifies the list of attributes that a credential contains. For example, a 
national ID card (issued by a government) may contain the first name, last name and 
date of birth of the owner, while a movie ticket (issued by a movie theater) contains 
the time and date of the showing and a seat number. The issuer vouches for the 
correctness of the information on the credential with respect to the intended owner. 
The issuing process may be carried out on-line, e.g., by visiting the issuer’s website, 
as well as off-line, e.g., at the local town hall. Credentials are issued by means of 
a certain credential technology. Technologies that support our credential model are, 
e.g., anonymous credentials [CLO1], X.509 certificate [CSFT08], OpenID [Ope07] 
or SAML [OASO5a]. 
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To gain access to a resource protected by a policy, the server has to be con- 
vinced that the policy is fulfilled. To do so in our model, the credential owner makes 
a claim about the credentials she owns and about the attributes they contain. Al- 
though claims are made independently from any concrete technology, the accom- 
panying claim evidence that authenticates them (typically generated by means of 
cryptographic mechanisms) is specific to the technology underlying the credentials. 

In the following, we list a number of features that existing credential technologies 
have: 


e Proof of ownership. To bind a credential to its legitimate owner, a credential 
may contain information that is used to authenticate the owner. This could be a 
picture of the user, a PIN code, a password, or a signing key. Proving credential 
ownership means that the owner authentication is successfully performed with 
whatever mechanism is in place. 

e Selective attribute disclosure. Some technologies allow attributes within a cre- 
dential to be revealed selectively, i.e., the service provider only learns the value 
of a subset of the attributes contained in the credentials. 

e Proving conditions on attributes. Anonymous credentials enable proving con- 
ditions over attributes without revealing their actual values. For all other tech- 
nologies, the only way to prove that an attribute satisfies a condition is by reveal- 
ing its value. 

e Attribute disclosure to third parties. Attributes are usually revealed to the rely- 
ing party enforcing the policy, but the policy could also require certain attributes 
to be revealed to an external third party. For example, the server may require 
that the user reveals her full name to a trusted escrow agent, so that she can 
be de-anonymised in case of fraud, thereby adding accountability to otherwise 
anonymous transactions. 

e Signing of statements. Certain technologies allow for the signing of a given 
statement to explicitly consent to it. The signature acts as evidence that this state- 
ment was agreed to by a user fulfilling the policy in question. 


We regard credential technologies that support (at least) the first three of the above 
mentioned features as privacy-preserving credential technologies. The utilisation of 
such technologies enables us to perform access control in a privacy-preserving way, 
which means, in the ideal case, that no more information than strictly required is 
revealed to fulfill a policy. As mentioned earlier, in the context of privacy-preserving 
technology, our focus is on anonymous credentials. 


18.1.2 A Policy Language for Privacy-Preserving Access Control 


The credential-based access control requirements language (CARL) that we have 
developed allows service providers to express the requirements that a user’s cre- 
dentials have to satisfy in order to gain access to a resource. This expressivity is 
achieved with the following features: 
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e Privacy preservation. Our language is privacy-preserving in the sense that it 
supports the principle of minimal information disclosure, i.e., a policy expresses 
the minimal claim that a user has to present. It does so in terms of which creden- 
tials have to be involved in the claim, which attributes of those credentials have 
to be revealed, and which conditions have to hold over the attributes (no matter 
whether these are revealed or not). Rather than assuming that all attributes in a 
credential are revealed by default, it clearly distinguishes between the require- 
ment to reveal the value of an attribute (e.g., the date of birth) and the require- 
ment that an attribute has to satisfy a certain condition (e.g., age greater than 18). 
This approach allows the user to minimise the amount of data that she reveals 
to the server, which is important as credentials often contain sensitive personal 
information. Additionally, our language also supports accountability, so that the 
user’s anonymity can be revoked by a third party in case of abuse. 

e Technology independence. Our language is independent of the technology un- 
derlying the credentials, so that different technologies or even a mix of tech- 
nologies can be used without modifying the policy specifications. Thus, service 
providers can specify policies without having to worry about the specifics of the 
underlying credential technology. 

e Multi-credential claims. Our policy language can express requirements involv- 
ing multiple credentials at the same time and has a way to refer to individual cre- 
dentials and the attributes they contain. It can thereby impose “cross-credential” 
conditions, i.e., conditions involving attributes from different credentials. The 
possibility to reference individual credentials is also important when a user has 
multiple credentials of the same type. For example, when a user has two credit 
cards, the policy should be unambiguous about whether it wants to see the credit 
card number and security code of the same card or of different cards. 


Here follows a comprehensive example of our policy language that captures all of 
its aspects and main features about a policy that is used by a car rental company to 
determine eligibility for a discounted rental car: 


01: own mc::MemberShipCard issued-by CARRENTALCOMPNAY 

02: own cc::CreditCard issued-by AMEX, VISA 

03: own dl::DriversLicense issued-by DEPTOFMOTORVEHICLES 

04: own is::LiabilityInsuranceStmt issued-by INSURANCECOMPANY 

05: reveal cc.number under ‘purpose=payment’ 

06: reveal is.policyNo to ESCROWAGENT under ‘in case of damage’ 

07: sign ‘I agree with the general terms and conditions.’ 

os: where dl.vehicleCategory = ‘M1’ A is.guaranteedEURAmout > ‘30.000’ A 
09: (mc.status = ‘gold’ V mc.status = ‘silver’) A cc.expDate > today() 
10: mc.name = dl.name /\ dl.name = is.name 


The policy states that users are eligible who (1) have a membership status of gold 
or silver with the company, (2) reveal the number of a valid American Express or 
Visa credit card for payment purposes, (3) are entitled by the Departement of Motor 
Vehicles to drive passenger vehicles, (4) reveal the insurance policy number of a 
liability insurance with coverage of at least thirty thousand Euros to a trusted escrow 
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agent who may disclose this policy number only in case of damage, (5) consent to 
the general terms and conditions, and (6) have the membership card, the drivers 
license and the insurance statement issued on the same name (to ensure that the 
driving person is a member and insured). Through the use of credential identifiers, 
it is ensured that, e.g., the credit card number that is revealed must come from the 
same card on which the validity is tested. 

An important aspect to note is that a user fulfilling this particular policy with 
privacy-preserving credentials does only reveal two pieces of information to the 
car rental company: the credit card number as well as the fact that she fulfills the 
policy. This comprises the assurance that the insurance policy number is revealed 
to the trusted escrow agent, which makes the user accountable in case of damage to 
the car. When revealing a credential’s attribute, a data handling policy can also be 
attached to the revealed data: in the policy above, for instance, a purpose is attached 
to the request for the disclosure of the credit card number. 

We have also defined the full grammar as well as the formal semantics for 
our policy language [CMN*10]. The semantics abstractly defines the intended be- 
haviour of an access control system for a given policy and thereby defines the obli- 
gations that an actual implementation must meet. 


18.2 Credential Ontologies: Concepts and Relations 


The formal language in logic-based proposals, including ours, can be extended to 
perform ontological inference and allow for the derivation of new concepts (also 
known as abstractions, abbreviations, or macros) from an initial set of basic con- 
cepts. 

Abstractions represent a shorthand for expressing, with a single concept, a com- 
position (e.g. a set, a disjunction, a conjunction) of multiple concepts. Thus, the 
use of abstractions in the policy specification provides a compact and easy way to 
refer to complex concepts. For instance, “Id_Document” can be defined as an ab- 
straction for any element in a set of credentials like {Identity_Card, Driver_License, 
Passport}. An authorization specifying that the requester needs to provide an id- 
document to access a resource can then be satisfied with any of the four credentials 
above. A de facto standard like XACML does not provide explicit support for the 
creation of abstractions. Here follows a proposal of formal definitions as guidelines 
for such an extension. 


18.2.1 Abstractions 


Formally, we define an abstraction as follows. 


332 C. Ardagna, S. De Capitani Di Vimercati, G. Neven, S. Paraboschi, E. Pedrini, et al. 


Definition 18.1 (Abstraction). An abstraction is a rule of the form g « d, where d 
is a complex credential condition, and g is a sequence of symbols that works as a 
meaningful shorthand. 


For instance, abstractions: 


e c:ld_Document <— c::Identity_Card \ c::Passport V c::Driver_License 
e e::E_Money < e::Credit_Card V e::Debit_-Card \V e::PayPal 


define Id_Document and E_Money as two abstract credential types corresponding to 
any element in the sets of credentials {/dentity_Card, Passport, Driver_License} and 
{Credit_Card, Debit_Card, PayPal}, respectively. Hence, a request for an identify- 
ing document (credential of type Id_Document) can be satisfied by providing either 
an identity card, a passport, or a driver license; a transaction calling for a payment 
with E_Money can be carried out by means of a credit card, a debit card, or PayPal. 

Abstractions can be exploited for defining and organizing concepts and tax- 
onomies without the need for hierarchical data structures in traditional ontologies. 
Moreover, as shown in the following, abstractions can allow for the expression of 
policies based on chains of credentials, thus providing a support for the introduction 
of delegation mechanisms in credential-based systems. 


18.2.2 Delegation by Recursion 


One of the most interesting features offered by logic-based policy languages is that 
their expressiveness can be exploited to support recursive conditions [ADP*10]. 
Recursion plays a crucial role in the context of access control, as it allows us to 
express restrictions on how authorities and trusted parties can delegate the ability to 
issue credentials. The delegation consists of a certification of the ability of a third 
party to produce credentials on behalf of the delegator. Delegation increases the 
flexibility in complex distributed systems, and it allows for inexpensive creation of 
credentials, particularly in an open environment, where we often deal with applica- 
tion requirements calling for the specification of restrictions in delegation. 

A policy language can support recursion by expressing conditions on data with a 
recursive structure. Let us illustrate this feature below. 

Let U be the set of all users that can take part in an access control process. Let 
p CUxXU be a relation between elements in U. As an example to illustrate our 
ideas, let us focus on certification authorities, and let p be a relation that holds 
between two certification authorities u and v if and only if wu has signed v’s public 
key on a certificate. With such signature, u delegates to v the authority to produce 
credentials, that is, any document certified by v is to be considered as certified by uw. 
In turn, v has the possibility to delegate her power to another certification authority, 
so that a chain of delegation is created. The description of such organization must 
be maintained in a data structure accessible by all the users that rely on the relevant 
certification authorities. 
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Let @p be the information entity that exhaustively describes p, in the form of a 
sequence of entries modeled like credentials of type Relp, including two identifiers 
as attributes that refer to pairs of users that are in relation p in U: 

Op = {0::Rel, such that (0 .user),0.userz) € p }. 

When p holds between u and v, that is, (u,v) € p, we assume that there exists a 
8 € @p such that @.user; = u and @.user2 =v. 

Conditions on data with a recursive structure like the one mentioned above can 
be requested in an access control policy. In our example, in which p is a relation of 
delegation between certification authorities, a requester trying to access a particular 
resource may be required by the server to show that the certification authority ca, 
signing her credentials has been delegated by a particular authority cas preferred by 
the server. 

The policy will then include the relevant condition on such a delegation: 


o1: own th::Rely issued-by CA_S 
02: where th.user; = ‘cas’ A th.user2 = ‘ca,’ 


Such condition can be easily rewritten according to the following abstraction: 
th.users = (u,v) < th.usery =u ( th.user2 =v 


In this scenario, it may be convenient to introduce the transitive closure of the del- 
egation chain. For instance, instead of setting conditions on the delegating authority 
that allowed for the requester’s certificate to be certified by a third party, a server 
may be interested in ensuring that the root authority ca;oo; at the very beginning of 
the delegation chain is among her preferred ones. 

The requester can prove that her ca, is in the relation p* (1.e., the transitive clo- 
sure of P) with cay either by showing that (Ccdyoo,,cdc) € P, or by providing a 
chain of context entries 0),...,0, € Op, where Cd;oo¢ 18 user} in 01, Cd, iS user in 
6,,, and for all 1 <i <n, 0;.userg = 0;4.user,, which can be abbreviated in what we 
define as a recursive condition: 


th.users=* (u,v) < th.users=(u,v) V 
th.users=(u,th' .user,) A th! .users=*(th' .userz,v). 


18.3 Dialog Management 


As mentioned before, the most widespread access control policy language to date, 
XACML, is affected by serious privacy issues. 

In fact, it assumes that the engine enforcing access control has all the information 
needed to evaluate whether an authorization policy is satisfied, and no infrastructure 
to manage the dialog between the parties is called for. The evaluation of a policy 
can result in four possibilities: permit, deny, not applicable (if the policy does not 
apply to the request), or indeterminate (if the server does not have the information 
necessary to evaluate the policy). In an open world scenario, this would require the 
requester to reveal all the necessary credentials together with the service request. 
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The disclosure of the complete portfolio of credentials of a user is a very strong 
requirement: requesters reasonably want the ability to send to the counterpart only 
just what is needed to acquire access to the desired resources. Our proposal based 
on anonymous credentials enables access requests in accordance with the minimal 
disclosure principle. Such innovation, however, relies on an extension of the system 
with a dialog management infrastructure that allows requesters to know the access 
control policy they need to satisfy, and, thus, enables them to select a proper set of 
credentials to show [ACK* 10]. Such communication infrastructure can be useful in 
another way: all the evaluations to indeterminate in those cases for which the server 
is missing information would be avoided, because the framework would introduce 
the possibility of notifying requesters which information is still required to give 
them the possibility to provide it and acquire access to the desired resource. 


18.3.1 Policy Sanitisation 


With the enrichment of access control systems with communication from servers to 
users on the enforced policies, another type of privacy issue rises. 

For instance, suppose that an authorization imposes that attribute nationality 
should be equal to “US”. Should the server communicate such a condition to the 
requester? Or should it just inform the requester that it has to state her national- 
ity? There is no unique answer to such question, and which one is to be preferred 
depends on the specific context we are focusing on. 

However, communicating the complete policy (i.e., the fact that the policy will 
grant access if the nationality is US) favors the privacy of the requester. A requester 
can know, before releasing credentials or information to the server, whether the 
release will be sufficient to acquire access to the service. A client associated with 
a non-US user can avoid disclosing the nationality of the user. On the contrary, 
disclosing only a part of the policy protects the server’s privacy. Access control 
policies are considered sensitive information, and as such they need to be protected. 
For instance, while the server might not mind disclosing the fact that access to a 
service is restricted to US citizens, it might not want to disclose other conditions as 
they are considered sensitive. 

Consider an authorization allowing access to a service to those users who work 
for an organization that does not appear in a Secret Black List (SBL) kept by the 
server. The corresponding condition, for some credential c carrying employment 
information, would then be: c.employer ¢ SBL. Communicating the complete policy 
to the requester would imply releasing the condition above, together with the state 
of black list SBL. Also, assuming the context of SBL is not released, the requester 
will know, in case she is not granted access, that her employer is black listed. This 
is clearly an information the server does not wish to disclose; rather the server will 
want to maintain confidential the condition and simply state that the employment 
certificate is required. 
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Among the two extremes of the current XACML approach of simply returning 
indeterminate, on the one hand, and of completely disclosing the policy, on the other 
hand, there are other options offering different degrees of protection to the server 
policy and of information communicated to the requester. 

Each condition appearing in the policy can then be subject to a different disclo- 
sure policy, regulating the way the presence of such a condition should be commu- 
nicated to the requester. We can distinguish five different disclosure policies, and 
each one can be used independently in any condition appearing in an expression. 
We assume to have a condition c.A 2 v (where A is an attribute, 7 a predicate, and v 
a value) on a credential of type T issued by authority S (c :: T issued-by S) to illus- 
trate the effects of the disclosure policies, as below, where we include the portion of 
a condition not to be disclosed in square brackets. 


e None. Nothing can be disclosed about the condition. It corresponds to the 
XACML approach of communicating that the outcome of the policy is indeter- 
minate, since there are conditions that cannot be evaluated. Formally, the condi- 
tion will appear in the policy on the server’s side completely included in square 
brackets, that is, [c :: T issued-by S] and [c.A z vJ. 

e Credential. Only the information that there is a request for a credential of a spe- 
cific type can be disclosed. There is no further information on how such cre- 
dential is to be evaluated. The condition will appear on the server’s side like 
c:: T| issued-by S] and [c.A 2 y}. 

e Attribute. The information that an attribute needs to be evaluated can be released; 
no information can be released on the control that will be enforced on such an 
attribute: c :: T issued-by S and c.A [z v}. 

e Predicate. The predicate with which the attribute in the condition is evaluated 
can be released; no information can be released on the value/s against which the 
evaluation is performed. On the server’s side the condition is written as follows: 
c: T issued-by S and c.A 2 [v]. 

e Condition. The condition can be fully disclosed as it is. Formally, the condition 
will appear in the expression with no square brackets, signaling that no compo- 
nent is subject to disclosure restriction, i.e., c :: T issued-by S and c.A 7 v. 


Table 18.1 summarizes the different disclosure policies reporting the formal no- 
tation with which they appear in the server’s policy and the relevant communication 
to the client in the dialog. A similar approach is presented in [ADP* 10]. 

Note that the disclosure policies of the server, affecting the information released 
to the requester about the conditions appearing in the policy, also impact the way the 
requester can satisfy the conditions. In particular, the credential policy implies that 
the requester will not know which information in the credential is needed and there- 
fore will have to release the credential in its entirety. The attribute policy implies 
that the requester can selectively disclose the attribute in the credential. The same 
happens with the predicate policy, where the requester also knows against which 
predicate the attribute will be evaluated. Finally, in the case of the condition policy, 
the requester can either provide the attribute (but it can assess, before submitting, 
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Disclosure Condition Condition 
policy at server at client 
none ||c:: T issued-by S] [] 
[c.A 7 v| [] 
credential | c:: T issued-by S |c :: T issued-by S$ 
c. [A wy c. [] 
attribute | c:: T issued-by S |c :: T issued-by S$ 
cA [mv] c Al] 
predicate | c:: T issued-by S |c :: T issued-by S 
cA tv] cAn[] 
condition | c:: T issued-by S |c :: T issued-by $ 
cATv cA 


Table 18.1: Disclosure policies and their effect on conditions. 


whether such a release will satisfy the condition) or provide a proof that the attribute 
satisfies the condition. 

Consider a policy stating that “a user can access a service if her nationality is 
Italian, her city of birth is Milan, and her year of birth is earlier than 1981”. Suppose 
that all attributes mentioned in the policy must be certified by an X.509 identity card 
released by IT_Gov. The policy is formally stated as: 


01: own id::IdentityCard issued-by [IT_Gov] 
o2: where id.method = ‘X.509” A id.nationality = [‘Italy’] A 
03: id.city_of birth = ‘Milan’ / id.year_of birth < [1981] 


Here, the square brackets representing the disclosure policies implicitly state that: 
i) conditions on attributes method and city_of_birth can be disclosed as they are; ii) 
conditions on the issuer and attribute nationality need to be protected by hiding the 
control that will be enforced on them; and iii) condition on attribute year_of_birth 
needs to be protected by hiding the value against which the evaluation will be per- 
formed. If the above policy applies to a request submitted by a requester for which 
the server has no information, the following conditions are communicated to the 
requester. 


o1: own id::Identity Card issued-by | ] 
02: where id.method = ‘X.509’ A id.nationality = [| A 
03: id.city_of birth = ‘Milan’ ( id.year_of birth < [] 


The requester can satisfy such conditions by releasing an identity card containing 
the requested attributes. 


18.4 Integration into XACML 


The goal of this section is to describe how one can bring privacy-preserving access 
control (as described above) to the real world [ADNT 10] by leveraging the status of 
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XACML as the de facto standard in access control languages. To do so, a number of 
issues need to be addressed: (1) First, XACML does not manage attributes bundled 
in credentials, and thus does not allow for distinguishing whether two attributes are 
contained in the same credential or in different ones. (2) Second, XACML prescribes 
that the requester communicates all of her attributes to the server for the evalua- 
tion of the access control policy, which is problematic from a privacy perspective. 
Some technologies such as SAML, OpenID, and anonymous credentials, offer the 
possibility to reveal only a subset of the attributes contained in a credential. Such 
features can be exploited by first communicating the policy to the requester, so that 
she can disclose only the information necessary for the access. (3) Third, XACML 
and SAML merely allow requesters to reveal concrete attribute values, rather than 
allowing them to prove that certain conditions over the attributes hold. This further 
privacy-preserving feature can be obtained by leveraging the cryptographic power 
of anonymous credentials. 

We envisage a setting where the requester owns a set of credentials obtained from 
various issuers, possibly implemented in different credential technologies. Servers 
host resources and protect them with policies expressed in an extended version of 
XACML. Users requesting access to a resource receive the relevant policy, which 
describes the requirements on the requester’s credentials in order to be granted ac- 
cess. The policy may include provisional actions, i.e., actions that the requester 
needs to fulfill prior to being granted access. Subsequently, the requester inspects the 
policy and, if she has the necessary credentials to satisfy it, she creates a claim over 
a suitable subset of her credentials, which can describe (1) values of attributes con- 
tained in these credentials, (2) conditions over non-disclosed attributes, and (3) the 
fulfilled provisional actions. The requester derives (technology-specific) evidence 
for the claim to convince the server of its correctness and her ownership of the cre- 
dentials. Afterwards, the requester makes a new request for the resource, but this 
time she includes the created claim and evidence. The server verifies the validity of 
the evidence with respect to the claim and evaluates whether the policy is fulfilled 
by the claim. Access is granted or denied accordingly. 

Before describing the extensions that we made to XACML, we give a short intro- 
duction to it. XACML defines an XML-based access control policy language as well 
as a processing model for evaluating the policies on the basis of a given XACML 
access request. Such a request specifies by means of attributes which subject (i.e., 
who) wants to perform which action (i.e., do what) on which resource (i.e., on what). 

An XACML policy is basically a structured set of Rules that define positive or 
negative authorisations (Permit or Deny rules). A rule contains a Target and a Condi- 
tion that together determine its applicability, i.e., to which access requests it applies. 

An XACML system consists at least of a policy enforcement point (PEP), a pol- 
icy decision point (PDP), a policy administration point (PAP) and a context han- 
dler (cf. Figure 18.1). Access requesters issue their requests to the PEP who is re- 
sponsible for enforcing the access control decisions that are rendered by the PDP on 
the basis of the request. The PDP makes decisions with respect to policies that are 
created and maintained by the PAP. The context handler is an intermediate compo- 
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nent between the PEP and the PDP that buffers the attributes that were given to the 
PEP in the request and provides them to the PDP on demand. 
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Fig. 18.1: XACML architecture with extensions. Standard XACML components are 
depicted with solid lines. Extensions are depicted with dotted lines. 


18.4.1 Credential-Based XACML 


The language extensions that we propose to XACML go beyond the standard ex- 
tension points. All proposed extensions are in line with the semantics of existing 
XACML language constructs though, i.e., we do not alter the semantics of existing 
elements or attributes. 

XACML rules that contain credential requirements can only have effect Permit. 
Rules with effect Deny are pointless as they essentially require that the requester 
does not have a certain credential. Assuming that the requester’s goal is to obtain 
access, she can always pretend not to have the specified credentials. 

Our extensions enable policy authors to express conditions on the credentials 
that a requester must own and the actions that she must perform prior to be granted 
access. To this end, we augment the xacml:Rule element with optional Credential- 
Requirements and ProvisionalActions child elements. The former describes the cre- 
dentials that the requester needs to own and the conditions these credentials have to 
satisfy. The latter describes the actions she has to perform. 


Credential Requirements. 


To express credential-based access control policies, the language needs a way to 
refer to the credentials that bundle several attributes together. Cross-credential con- 
ditions are another important use case: for example, the policy language must allow 
for expressing that the names on a credit card and on a passport must match, or that 
the expiration date of an entry visa is before the expiration date of a passport. 
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To this end, CredentialRequirements contains a Credential child element for each 
credential involved in the rule. The Credential can contain AttributeMatchAnyOf 
child elements that allow for comparing an attribute of that credential to a list of 
values. The CredentialRequirements also contain a Condition where conditions on 
the credentials’ attributes can be expressed. Inside a condition, one can refer to an 
attribute Att rid within a particular credential by means of CredAttributeDesigna- 
tor. An example rule is given in Figure 18.2. 


<Rule Effect="Permit" RuleId="rule2"> 
<xacml:Condition> 
<!-- XACML condition relevant for the rule"s applicability --> 
</xacml:Condition> 
<CredentialRequirements> 
<Credential CredentialId="pp"> 
<AttributeMatchAnyOf AttributeId="pl:CredentialType"> 
<MatchValue MatchId="xacml:string-equal">un:PhotoID</MatchValue> 
</AttributeMat chAnyOf> 
<AttributeMatchAnyOf AttributeId="pl:Issuer"> 
<MatchValue MatchId="xacml:anyURI-equal">http://www.usa.gov</MatchValue> 
</AttributeMat chAnyOf> 
</Credential> 
<Condition> 
<xacml:Apply FunctionId="xacml:date-less-than-or-equal"> 
<CredentialAttributeDesignator CredId="pp" AttributeId="un:DateOfBirth"/> 
<xacml:Apply FunctionId="xacml:date-subtract-yearMonthDuration"> 
<xacml:EnvironmentAttributeDesignator AttributeId="xacml:current-date"/> 
<xacml:AttributeValue DataType="xs:duration">P21Y</xacml:AttributeValue> 
</xacml:Apply> 
</xacml:Apply> 
</Condition> 
</CredentialRequirements> 
<ProvisionalActions> 
<ProvisionalAction ActionId="pl1:Reveal"> 
<xacml:AttributeValue DataType="xs:anyURI">un:Sex</xacml:AttributeValue> 
<xacml:AttributeValue DataType="xs:anyURI">pp</xacml :AttributeValue> 
</ProvisionalAction> 
</ProvisionalActions> 
</Rule> 


Fig. 18.2: Example rule stating that access is granted to users who are at least 
twenty-one years old according to a piece of PhotoID issued by the US govern- 
ment, but only after revealing the gender mentioned on the same piece of PhotoID. 
Namespace prefix xacm1 refers to the XACML 3.0 namespace, xs to XML Schema, 
pl to http://www.primelife.eu, and un to http: //www.un.org. 


Conditions on credential attributes are expressed using the same schema as the 
xacml: Condition element (extended by CredAttributeDesignator), but are contained 
in a separate Condition child element of a Credential element. 


Provisional Actions. 


The ProvisionalActions element contains the actions that have to be performed by 
a requester prior to being granted access. The types of actions that we model are 
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the revealing attributes (optionally to third parties and optionally with an attached 
data handling policy) and the signing of statements (to express consent). Each pro- 
visional action is contained in a ProvisionalAction element that includes an action 
identifier as an attribute Act ionId. For example, the policy in Figure 18.2 contains 
the requirement to reveal the gender as specified on the identity card. 


18.4.2 SAML as Claims Language 


Here we describe how we extend SAML for transporting the claims as defined in the 
beginning of this Section. SAML is a standard allowing for the exchange of certified 
attributes bundled together into assertions, which are similarly structured as creden- 
tials. The standard, however, only allows for the exchange of attribute values but not 
conditions on such values nor notifications of provisional action fulfilment. To ad- 
dress these issues, we use the standard’s extension points to embed our Condition 
and ProvisionalAction elements into SAML assertions. 


18.4.3 XACML Architecture Extensions 


In the following, we sketch how we adapt the XACML architecture such that (1) 
the credential-based XACML policy applicable to a request is communicated to the 
requester, and (2) the policy can be evaluated on the basis of the provided SAML 
claim. The modified architecture maintains all standard XACML functionality, 1.e., 
the modifications are extensions that do not substitute existing functionality and that 
are usable in combination with standard features. 

We adapt the XACML communication model for allowing the following two- 
round pattern. In the first round, the requester specifies a resource and obtains the 
relevant policy from the PEP; in the second round, the requester sends the same 
request with an additional SAML claim. Resending the request is necessary be- 
cause the XACML architecture is stateless, meaning that the individual components 
do not maintain information across multiple rounds. A PEP’s response in the first 
round is embedded in an XACMLPolicy Assertion element (cf. SAML profile of 
XACML [OAS05b]), to which the requester is supposed to reply with an appropri- 
ate SAML claim. The PEP grants or denies access depending on the claim’s validity 
and the decision of the PDP. 

We need to modify the PEP such that it obtains all policies applicable to a 
user’s request and then sends them in a pre-evaluated version to the user. The pre- 
evaluation substitutes known attributes, e.g., environment attributes such as time and 
date, with concrete values. 

When the PEP receives a request with an attached SAML claim, it has to verify 
the validity of the claim and make it available to the PDP. To verify the validity of 
the claim evidence, we extend the PEP with an evidence verifier component (cf. Fig- 
ure 18.1). For every supported credential technology fr, this component has a plug-in 
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that can verify evidence specific to this technology. To make the claim available to 
the PDP, we introduce a claim handler component within XACML’s context han- 
dler. If the claim is valid, the PEP forwards it to the claim handler, which buffers it 
so that it can be retrieved by the PDP. The PEP then forwards the request (without 
attached claim) to the PDP. 

A PDP evaluates a request from the PEP as usual with respect to the rules in 
the policy. However, rules with credential requirements or provisional actions are 
treated specially. For such rules to yield a Permit decision, not only its applicability 
(specified by its target and condition) is relevant, but also the fulfillment of the 
credential-requirements and provisional actions, if any are specified. If so, the PDP 
fetches the claim from the claim handler. We extend the PDP with a rule verifier 
component that, for given credential requirements, given provisional actions, and a 
given claim, decides whether the claim implies the requirements and fulfills all the 
provisional actions. If so, then the rule evaluates to Permit, otherwise it evaluates 
to Indeterminate. 


18.5 Concluding Remarks 


By means of the research work illustrated in this chapter, we aimed at advancing 
access control technology with a specific focus and a wide-ranging impact: enabling 
all involved parties to request, evaluate, grant, and obtain access to services and 
data in a way that fulfills their privacy preferences in the best possible way. For the 
current state of the art to progress, many requirements were met, regarding both 
policy languages in general, and access control in particular. 

Our main focus on anonymous credentials has obviously played a fundamental 
role in fully achieving the goals on data minimisation and anonymous or pseudony- 
mous access control, brought even further by the introduction of sanitisation tech- 
niques to meet servers’ privacy issues. 

Our credential model, comprised of a list of signed attributes, provides a solid 
support to both role-based and attribute-based access control. Moreover, we have 
elaborated a declarative language that allows for the expression of high-level, com- 
pact policies that are easier to understand than the proposals made so far in the 
field. The policy model is indeed technology-independent, which greatly widens its 
application scope. 

The simple structure of each credential paves the way for quick yet effective 
construction of ontologies of credential types and attributes, which not only allow 
for even more compact policies, but also support delegation mechanisms through 
sequences of credentials embodying chains of trust between certification authorities. 

Finally, we embedded our proposals into an existing standard like XACML, 
whose policy prioritization and combination mechanisms are thus made available 
to policy editors. 


Chapter 19 
Legal Policy Mechanisms 


Leif-Erik Holtz and Jan Schallab6ck 


Abstract Transparency is one of the core principles of data protection legislation in 
Europe, beyond Europe and all around the world. The European understanding is 
different than the American one as the European understanding is that individuals 
should be aware of “who knows what about them.’ Often enough the establishment 
of the European understanding is hard to enact, enforce and above all make under- 
standable to the user because the user is confronted with a multitude of different 
purposes for data handling, often hidden in lengthy legal text of privacy notices 
especially when surfing the web. Therefore, a number of approaches are currently 
trying to tackle this problem, by offering the user tools and mechanisms for a bet- 
ter understanding of what is happening with their data. The work presented in this 
chapter is an outcome of PrimeLife’s research on Next Generation Policies, it aims 
at a better understanding of the legal aspects of the processing of personal data, by 
looking at the current status of this processing in different contexts and structuring 
these. 


19.1 Introduction 


The research on legal policy mechanism aims at developing the basis for taxonomies 
and partonomies that can be used as a starting point in defining vocabulary for tech- 
nical and legal privacy policies and languages expressing the latter. What is the pur- 
pose for this research approach? Why are taxonomies and partonomies important 
means for defining vocabulary for privacy policies? 

As described above, transparency is one of the core principles of data protection 
legislation in Europe [Com95], beyond [APEO5] and all around the world [OEC80]. 
The common understanding is, that individuals should be aware of ‘who knows what 
about them’ [Cou83]. This concept is supported by the principle of purpose or col- 
lection limitation. This principle stipulates, that any collected personal information 
may only be processed for those purposes it was collected for. Roots of this principle 
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can be seen in what Nissenbaum calls ‘contextual integrity’ [Nis04, Nis 10, Nis98], 
or equally in the sociological concept of ‘functional differentiation’ [Luh77]. They 
appear to be basic conditions of just communication and social interaction in demo- 
cratic societies. Often enough, these principles are hard to enact, enforce and above 
all hard to understand for a user. The user is confronted with a multitude of different 
purposes, often hidden in the lengthy legal text of privacy notices’ especially when 
surfing the web. 

A number of approaches are currently trying to tackle this problem, by offer- 
ing the user tools and mechanisms for a better understanding of what is happening 
with their data (see below in the references). However, in most if not all of these 
approaches’ it is unclear what is actually necessary to communicate. The problem 
relates to the above. For a higher level of transparency, the user should be made 
aware of what actually happens to the data, who is processing it, and — if collected 
without the informed consent of the user, what data is processed (e.g., Cookies, 
IP-Addresses, clickstreams etc). While the latter elements may be easy to commu- 
nicate (but still might need some further thought), the question of how to express 
in a simple way, how the data are processed, and for what purpose(s) they are col- 
lected, poses difficulties. The multitude of applications and uses of personal data are 
highly unstructured, as no comprehensive ontology exists, and no abstractions are 
apparent. 

For developing a typology of the processing types of personal data an empirical 
approach, looking at the current practice, seems appropriate. 

The work in PrimeLife’s Activity 5 on Next Generation Policies (cf. Part IV 
of this book), especially in the area of legal policy mechanisms, aims at a better 
understanding of the legal aspects of the processing of personal data, by looking at 
the current status of this processing in different contexts and structuring these. 


19.2 Legal Framework for Processing Personal Data 


Recent and ongoing work in user transparency and legal privacy policies is done 
in the area of Human-Computer-Interfaces (HCD), as well as in technical represen- 
tations and functional descriptions of policies for privacy and data protection. The 
Article 29 Working Party has endorsed the use of multi-layered legal policies for 
websites [Par04], which has been implemented in several places on the web [Bro08]. 
Others have been developing tools and interfaces to support the user [Con07a]. Fi- 
nally there are proposals to use iconography to simplify the recognition of a le- 
gal privacy policy for the user [Run06] [Fis06] that have gained some momentum 
[Pri09c]. 

For a formal description of privacy policies, P3P is an available specification 
[W3CO01], and offers some structural reference. XACML [OAS08], EPAL [W3C03], 
Liberty’s Internet Governance Framework [Pro07], and WS-Policy [W3C06b] are 
discussed for further expressing rules for the processing of personal data. However, 
all of these specifications are very limited, or even completely lacking in offering 
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conventions on describing the aspects most relevant to the user: What is actually 
going to happen to the data, what purposes are they used for, under what conditions 
and with what obligations? 

Last but not least the PRIME-Project [Con07b] did initiate some research for 
a more thorough ontology in this area and had reached some first results, which 
further research should take into account, as well. 

General legal requirements for processing personal data are defined in the Data 
Protection Directive 95/46/EC (DPD) and the Directive on Privacy and Electronic 
Communications 2002/58/EC (DPEC). DPD and DPEC mandate member states of 
the EU to implement its normative content into their respective jurisdictions. The 
state of the implementation is documented by the EDPS (European Data Protection 
Supervisor). Legal requirements for privacy policies encompass data collection as 
well as data handling, i.e. the processing. The core element of the DPD is only 
allowing the collection and processing of personal data, if there is a legal basis or if 
the data subject unambiguously gave his consent, cf. Article 7 of the DPD. 

In any case of data processing, its is always worthwhile to review the six legal 
grounds for data processing to ensure that one of them is present and that the data 
are therefore processed legitimately [Kun07]. 

The legal grounds are the following: 


e The data subject has unambiguously given his consent (see Article 7 (a) (all 
following Articles refer to the DPD)); or 
e Processing is necessary 


— For the performance of a contract of which the data subject is party, or in order 
to take steps at the request of the data subject prior to entering into a contract 
(see Article 7 (b)); or 

— For compliance with the legal obligations to which the controller is subject 
(see Article 7 (c)); or 

— In order to protect the vital interests of the data subject (see Article 7 (d)); or 

— For the performance of a task carried out in the public interest or in the exer- 
cise of official authority vested in a controller or in a third party to whom the 
data are disclosed (see Article 7 (e)). 


If the collection is based on the latter, the following information has to be pro- 
vided to the data subject (amongst others): 


e The data controller has to be declared as well as 

e The types of data collected and 

e Legitimate purpose of the processing has to be defined, cf. Article 6 and 7 of the 
DPD. 


In the further process, the data controller has to ensure that the data is kept accu- 
rate and only used for the purposes declared ex ante. While the former are relatively 
easy to define, a specific problem of the legal requirements is the defining of the 
purpose of data collecting. 

The difficulty of how precise the policy language has to be, can also be seen in 
light of the DPD. According to Article 12, lit a, bullet point 3, the data subject has the 
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right of information about the data handling. This is also the expression of the right 
of informational self-determination of the data subject. Therefore it is necessary, 
that the data collector displays at least applied information about data handling in 
his policy. And the data collector also needs a precise policy to comply with his 
promises when processing the data internally in his own business. To ensure the 
implementation of legal requirements during data handling, the data collector has 
to advise his employees on how to process the collected data. To avoid mistakes in 
handling data, the advice has to be as precise as possible. 


19.3 Gaps in Current Policy Language Approaches 


Sticky policies promise to improve the state of the art in data protection, both on 
the level of better control, and on the level of increasing transparency. A sticky 
policy is usually the result of an automated matching procedure between the data 
subject’s data handling preferences and the data controller’s data handling policy. 
Associated to a resource, it is the agreed-upon sets of granted authorisations and 
promised obligations with respect to a resource. Sticky policies are policies that 
control how data is to be accessed and used and that accompany data throughout an 
entire distributed system [CL08]. 

A privacy policy that would support privacy by way of sticky policies would need 
to implement legal requirements for the employees of the data collector as well as 
for the data subject. The maxim of transparency in data collecting and data handling 
also argues for the assumption that a privacy policy has to be as precise as possible. 
This clarifies the need for a precise and well-implemented privacy policy. A number 
of different and *complementary” approaches are currently being taken to support 
legal compliance in current IT-Systems. 

The goal of these approaches is to support compliant use of the system by mix- 
ing technological and organisational mechanisms. The legal framework is ideally 
already introduced when defining the specification of the system. Numerous policy 
languages currently available or under development address this. In the following a 
two approaches, XACML (extended) and P3P will be highlighted and analysed with 
respect to their potential to support legal compliance. 


19.3.1 XACML 


Extensible access control markup language (XACML) is an XML-based language 
for expressing and interchanging access control policies [Pri09b, page 48]. The lan- 
guage’s core functionalities are geared towards access control, but it also offers 
standard extension points for defining new functions, data types, policy combination 
logic and more. In addition to the language, XACML defines both an architecture for 
the evaluation of policies and a communication protocol for message interchange, 
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as well as supporting multiple subject specifications in a single policy, multi-valued 
attributes, conditions on metadata of the resources and policy indexing. XACML 
is an applicable mechanism for technically implementing legal requirements of ac- 
cess control. The control of data handling on the other hand is not fully covered by 
XACML, but may potentially be achieved by means of its extensibility. The benefits 
of XACML on the one hand and the shortcomings on the other hand have already 
been depicted in detail in Chapter 20. 

During the development of PrimeLife, some effort has been spent on extending 
XACML. This work suggests a new obligation handling mechanism, taking into 
account temporal constraints, preobligations, conditional obligations and repeating 
obligations together with a down-stream usage authorisation system, defining the 
access control rules under which personal information collected by an entity can be 
forwarded to a third party [ABD* 10]. 

Part of this work is based on the concept of trusted credentials. XACML was 
extended with data handling and credential capabilities. The overall structure of 
XACML was maintained, and a number of new elements to support the advanced 
features the language offers were introduced. It can be used by the data controller as 
well as by the data subject. The result of the completed research towards XACML- 
PrimeLife is that it is a suitable way to display the data collecting party, but that 
there is still difficulty in displaying the purpose of the data processing. Currently, the 
research forms a wrapper in this regard that can be used to contain elements of the 
Platform for Privacy Preference (P3P), but also could include different ontologies. 


19.3.2 P3P 


For a formal description of privacy policies, P3P is an available specification that 
offers some structural reference [W3C01]. P3P enables websites to express their 
privacy practices in a standard format that can be retrieved automatically and inter- 
preted easily by user agents. 

Concern has been raised that P3P may be too complicated to be understandable 
for users. However, P3P has the advantage of describing the aspects most relevant to 
the user: what is actually going to happen to the data, what purposes they are used 
for and under which conditions and with what obligations [Pri09a, page 28]. 

Moreover, the expressiveness of P3P, especially in regard to describing purposes, 
is limited (albeit extensible). The predefined set of purposes was limited to so-called 
secondary purposes in the first specification (http: //www.w3.org/TR/P3P/), 
which was then complemented with a flat partonomy of some twenty “primary pur- 
poses” in Version 1.1 [W3C01]. Taking into account the requirements for displaying 
the purpose to the user, as well as requirements of internally achieving legal process- 
ing of the data within the limits of the purpose, this appears to be too limited, as we 
have shown in a number of scenarios (below, Section 19.5) and will need further 
extension in its vocabulary, potentially by way of a more comprehensive ontology, 
or at least a taxonomy or partonomy. 
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19.4 Methodology 


To derive a basis for developing a vocabulary following the criteria specified for 
sticky policies, empirical approaches were analysed and evaluated with regards to 
their value for developing such a basis: 


e The German corpus juris, especially those norms in federal law, allowing the 
processing of personal data by public bodies and agencies, 

e “Verfahrensverzeichnisse” a regulation specific to German data protection law: 
it mandates data controllers to maintain a list of those processes, within which 
they are processing personal data, cf. 4 lit. e BDSG, 

e Privacy Statements from the Internet, possibly in cooperation with further enti- 
ties, 

e PrimeLife Use Cases. 


The goals of such an approach are to find and to define vocabularies for the 
following attributes: 


e Processes and services for personal data, 

e Purposes of Processes and a partonomy/taxonomy thereof, to be sorted by rele- 
vance, 

e Typical sets of data, expressed as a partonomy of data types and data sets, 

e Data types and possibly qualifications/attributes (such as sensitive information as 
defined by 95/46/EC), 

e Reference to the legal basis for the processing (norms, legal privacy policies), 

e Text elements from legal privacy policies, 

e Possible further elements, especially obligations, such as logging, deletion, block- 
ing, further information (e.g., in case of incidents), retention periods (currently 
not included in the research), 

e Categories of data processors, and a partonomy/taxonomoy thereof (currently not 
included in the research). 


It is due to the very nature of data processing that it is impossible to conduct a 
comprehensive ontology towards this topic, but there seems to be the option of an 
advanced taxonomy and possibly an ontology, which would cover processing to a 
certain, possibly defined, level of detail and brevity. 


19.4.1 Looking into Privacy Policies 


Our research, conducted on a limited number of privacy policies, has shown differ- 
ing results on the usability of the analysed policies. We have analysed 34 privacy 
policies, most of them from dutch websites and therefore most of them under the 
legislation of the DPD. The following results of the analysis can be measured: Some 
of the privacy policies implemented the legal requirements for collecting data in a 
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legally compliant manner while others did not. The implementation of legal require- 
ments for data handling on the other hand was not displayed very well. The main 
problem in privacy policies seems to be the description of data handling. This also 
is a legal requirement, but only few policies implemented it. More detailed informa- 
tion about the analysis can be found in PrimeLife Heartbeat H5.2.2. 

Many of the analysed policies did not have the level of transparency that they 
should have provided, which raised some concern, as to whether they complied with 
the requirements of the legal framework in place. This alone would certainly have 
been an interesting field for further research, but was not within scope the conducted 
research. 

Another difficulty that arises when analysing such policies is a possible bias of 
interpretation. One approach to balance this bias, would have been to have each 
policy analysed by two researchers, with a third looking at those policies, where the 
previous results differed. This approach, although promising, was dismissed, taking 
into account the resources available for the research, but it should be taken into 
consideration for further research. 


19.4.2 Looking at the Law 


The German corpus iuris appeared to be another interesting empirical basis. Due 
to the specific construction within German law, the processing of personal data by 
governmental agencies needs a specific legal basis, therefore a broad data set was to 
be expected. 

After a selection of laws were analysed, evaluation hinted towards the fact, that 
this approach might not be the most effective. The specific language chosen in many 
cases would remained on too high of a level to yield results of the granularity neces- 
sary. Again, this indicates, that even lawmakers do not achieve a reasonable level of 
transparency in their laws, which unfortunately also makes this approach inefficient. 

A similar, but slightly different approach based on German law, was using Ger- 
man Verfahrensverzeichnisse (i.e. literally: processing directories). This specificity 
of German law, mandates data controllers to describe each process wherein per- 
sonal data is processed. While the former approach seemed promising, the available 
material from the German Verfahrensverzeichnisse was too limited to come to an 
effective result. Although controllers are obliged to have these directories, they are 
often not kept up to date. It was expected, that organisations would react with an out- 
cry, if asked to provide their Verfahrensverzeichnisse, especially when asked by a 
data protection authority, which is the partner employing the researchers conducting 
the research. Even though, due to these implications, it was refrained from follow- 
ing this approach, it may have triggered further action within the agency, promising 
follow this approach at a later point in time. 

Further research concentrated on researching selected use cases. On the one hand, 
working with use cases has the disadvantage that a very comprehensive ontology 
does not seem to be in reach on this basis. On the other hand, there are several 
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advantages, in working with use cases. For example, each possible scenario could 
be analysed quite comprehensively and the problems in real life in connection with 
privacy protection can be displayed very well. 


19.5 Use Cases 


During the course of PrimeLife, the project developed a set of use cases. Thus there 
was already a strong foundation to build upon. Therefore the approach promised 
a high level of compatability for other research aspects within the project. One of 
the use cases that was analysed and will be analysed further is the scenario of a 
‘classical’ online-shop (which had already been a good scenario to display the prob- 
lems concerning privacy protection during the PRIME project). Subsequently, the 
methodology was also performed on a social network site’s use case. Social network 
sites comprise a higher degree of complexity related to purposes of data handling be- 
cause many different constellations may appear and many different data controllers 
or processors respectively may exist. Given the complexity of social network sites, 
we were also able to assess the limits of the methodology. 


19.5.1 Online Shopping 


For reasons of brevity, the use cases can only be described by example. For this 
chapter, age verification was chosen, as it has a number of interesting implica- 
tions. A more comprehensive overview can be depicted from the figure below, 
cf. Fig. 19.1. 

Coming back to the example: One of many questions for a web shop scenario 
is the task of age verification. If a shop - such as Amazon - sells comics like Don- 
ald Duck, there is no need to verify the age of the user. The comic has no youth 
endangering contents and is therefore free of age limitations. 

A typical case of national laws allowing minors to engage smaller contractions, 
is the German “Taschengeldparagraph’ 110 BGB, allowing minors to conclude valid 
contracts with their pocket money. Here, there is no need — and no legal basis — for 
collecting the data concerning the date of birth. If the same shop sells alcohol — 
such as wine — there is an age limitation and the legal basis for collecting the data 
concerning the date of birth is the necessary to comply with the legal obligation 
of not selling alcohol to minors. Therefore, the privacy policy has to display the 
purpose for collecting a personal data such as the date of birth. Another question 
according the web shop is, which data is needed for payment. This depends on 
many parameters. One conclusion of the finished research and the estimated results 
is, that the research on concrete scenarios will lead to an available result. The past 
and the further research on the concrete scenario of a online shop might approximate 
a number of problems concerning data protection. 
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After a thorough analysis it was clear, that a simple scenario such as the one on 
‘Online Shopping’ can yield a plentitude of different data handling purposes and 
obligations. For a better understanding of the different purposes, we tried to define 
a selection of them inter alia with their possible legal basis. 


Payment different addresses for delivery and invoices 
Standard Parcel 
Delivery Packetbox 


Information via telephone? 


Calling back of a product 


Money Laundering 


Legal purposes 


Purposes . 
Taxation 


Public Wish-lists 
Online-Shopping 


Customer loyalty programmes 


New customer rebate systems 


Marketing 
"tell-your-friends" 
Recommendation Systems 
Age Verification 
Creditworthiness 
Eligibility 


Prescriptions for Pharmacies 


Legitimation to buy / bear weapons 


Fig. 19.1: Overview of different purposes in an Online-shopping scenario. The fig- 
ure shows the multitude and hierarchy of purposes. The latter, however maybe mis- 
leading, often a definite hierarchy of purposes cannot be described. 


With regards to some of the scenarios, the collection may also already be legal 
simply because it is *necessary for the purposes of the legitimate interests” of the 
shop (cf. Art. 7, lit. e of the DPD), but in many other scenarios the controller will 
rely on the user actively giving consent to the processing of the information. The 
complete range of all purposes for collecting data have to be displayed, to broadly 
inform the user. This would make an informed decision impossible. On the other 
hand, broad information is the assumption for active consent of the user towards 
collecting his personal data. Therefore the purposes of all scenarios of the use case 
‘online shop’ have to be displayed in a way that the user can understand. The analy- 
sis of the web shop use case illustrated the fact that purposes of data handling can be 
visualized quite well in this use case and be made specific on a very granular level. 
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From this perspective the web shop use case appears to be a very well-defined case. 
Purposes can be differentiated clearly and depicted relatively easily. 


19.5.2 Social Networking 


Social Networking implies some very specific legal questions, many of which are 
currently under heavy discussion within jurisprudence. They may therefore cur- 
rently very well be considered as one of the more difficult legal challenges in privacy 
legislation. It therefore seemed reasonable to believe, that if a model could be devel- 
oped for Social Networking, it would be possible to cover other areas with a similar 
approach. Over and above, better understanding privacy in Social Networks is one 
of the core research areas for PrimeLife, making studying a social network use case 
a logical step. 

Purposes in social network sites can vary greatly, as will be shown. The question 
was: how can purposes in social network sites possibly be defined? According to 
the definitions of social network sites, this depends on different factors. One of the 
factors is the differentiation between those purposes defined by the provider of the 
service and those defined by the user. 

Provider-defined purposes in social network sites can be quite complex. The pur- 
poses for data handling depend inter alia on the different purposes of the social net- 
work sites per se. For example, in those social network sites that involve primarily 
commercial or business objectives, other purposes may occur than in social net- 
work sites addressing leisure and recreational activities. We can therefore conclude 
that business oriented networks tend to define purposes for data handling mainly 
to promote the career benefits for the users. Other social network sites have differ- 
ent purposes such as “getting in touch with old classmates or friends” and “getting 
concrete answers to questions inside the network” [Lin10]. Such a network that is 
purely career oriented obviously defines other purposes for data handling. Apart 
from the network’s business model and niche, many different functionalities can be 
used within social networks. For example, Facebook offers functionalities such as 
mobile phone usage or sms usage [Fac10]. The different applications again have 
different purposes and therefore different purposes for data handling. Thus, social 
network sites offer a (potentially unlimited) wide variety of user defined purposes 
for status messages and thereby for data handling. 

Examples for provider side defined use cases are therefore: 


e Status messages, including status messages in closed user groups and public sta- 
tus messages, 

Designing own profiles, 

Personal messaging, 

Social search, 

Uploading Photos, including Tagging and commenting and Location data. 
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On the other hand there are user-defined purposes, which can differ largely from 
provider-defined purposes. Users are self-defining the purposes for data handling 
on a case by case basis, which makes it very difficult to asses on a system side ex 
ante. These user-defined purposes in turn also tend to be more defined by aspects 
of usage culture between users. In this area the research was therefore following a 
slightly different approach, which eventually was integrated into a separate project: 
One way of predefining purposes in a user-defined scenario might be the privicon 
approach [CSBS], where users can predefine how the recipient of an email should 
handle the mail. 


19.6 Results and Further Research 


The research completed and described within this chapter leads to the conclusion 
that many data controllers act on the assumption that it is precise enough to display 
the legitimate reason of the data collector on handling data as a legal basis (see Ger- 
man Federal Data Protection Law (BDSG) [Bun08], article 28, deposit 1, number 2 
or article 7, letter d of the DPD) as part of their policies. This ‘catchall element’ — 
legitimate reason — is used as a general reason (BDSG, article 28, marginal num- 
ber 1). However, this should not be the state of the art of privacy policies, as it does 
not allow a reasonable level of transparency for the data subjects. 

Our research has shown that to avoid this for the future, a more precise descrip- 
tion of policies is necessary. Moreover, it has also demonstrated that this can be 
done, at least in the selected cases. The completed work includes and compares dif- 
ferent methodological approaches, the research based on use cases as well as the 
research with larger empirical bases. All approaches had a common goal in mind: 
the need of having access control policy languages that, on one hand, provide ac- 
cess control functionality and, on the other hand, protect the privacy of the involved 
parties and of their personal information. 

After carefully analysing broader empirical approaches, a promising and exten- 
sive analysis, yielded extensive efforts. To be more precise: the empirical analysis of 
German data protection law alone, and privacy policies of German sites only, would 
be immense. An analysis of published privacy policies yielded similar limitations. 
Especially when trying to effectively rule out bias of interpretation, the effort dou- 
bles or triples. The completed empirical research also leads to the conclusion that 
all analysed approaches have deficiencies, as was shown above. This is why a mixed 
approach is taken: empirical research is supplemented by use case-based analyses. 
This approach although it did not promise the comprehensiveness of an empirical 
analysis, but rather focused on an exemplification of the required expressivity of the 
language. It includes, however, the advantage, of being able to look deeper into the 
technical processes underlying the respective policies, rather than solely looking at 
what is published in privacy policies. One of the benefits of this approach is the 
fact that it is easier to handle a known system as a basis of research than to handle 
unknown systems. 
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Nevertheless, future research will have to take all possible options into account, 
if a more comprehensive overview of data handling practices is to be achieved. 
A draft database structure for collecting such material was also developed as part 
of the research. The use case scenario ‘online shopping’ has also suggested that 
there are certain shortcomings in displaying data processes via the specifications 
that P3P offers, or at least that a higher expressivity is not only doable, but also 
sensible. However, future research will have to look into a careful comparison of 
both approaches by matching the results of the use case described herein to the 
expressivity of P3P. 

More detailed information about the ongoing work and results achieved can 
be found in PrimeLife Deliverable D5.2.3 [Prillb], and in PrimeLife Heartbeat 
H5.2.2 [Pril la]. 


Chapter 20 
Policy Implementation in XACML 


Slim Trabelsi and Akram Njeh 


Abstract This chapter presents the implementation details of the PrimeLife policy 
engine (called PPL engine). This engine is primarily in charge of interpreting the 
policies and the preferences defined by the Data Controllers and the Data Subjects. 
Additionally, this engine is responsible for the enforcement of the privacy rules 
specified by the user. The enforcement is characterised by the application of the 
access control rules, the execution of the obligations and the generation/Vverification 
of the cryptographic proof related to the credentials. In this chapter we describe 
the architecture of this engine, the structure of policy language, and finally the data 
model of the implementation. 


20.1 Introduction 


Since the PPL language is specified as an extension of the XACML (eXtensible 
Access Control Markup Language) language, the PPL engine is designed to run to- 
gether with any XACML engine (that only handles XACML access control rules). 
The architecture chosen for the deployment of the PPL engine is symmetric because 
Data Subjects and Data Controllers have similar requirements: Deciding whether 
a given piece of personal information (resp. collected data) can be shared with a 
Data Controller (resp. Third Party); handling obligations associated with data; stor- 
ing data and associated preferences (resp. sticky policies). Using the same archi- 
tecture everywhere to handle scenarios where one party can have multiple roles 
(e.g., collecting data and then disclosing it to third parties). The PPL engine exe- 
cutes multiple tasks in automated ways such as: Enforcing access control policies, 
generating and verifying cryptographic proofs related to credential requirements, 
matching between data handling preferences and data handling policies, generating 
and enforcing sticky policies, checking Authorisation, controlling the downstream 
usage of data, handling obligations, etc. 
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20.2 Architecture 


In this section, we will present the design phase of the PPL engine. We present the 
architecture of the PPL system by defining a high level and a detailed architecture. 


20.2.1 High Level Architecture 


As presented in (Figure 20.1) below, the high level architecture presents an abstract 
overview of the PPL architecture and the interaction between the different entities; 
Data Subject, Data Controller and Third Party (Downstream usage). 
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Fig. 20.1: High level architecture. 


20.2.1.1 Data Subject 


Policy engine: This component is in charge of parsing and interpreting the privacy 
preferences of the Data Subject. This policy engine supports the entire PrimeLife 
Language capabilities (Preferences, Access control, DHP, Obligations, Credentials 
etc). For this reason, this module is replicated on the Data Controller side and the 
third party side. 

Repository: Represents the PII and policy repositories. It is a database contain- 
ing data owned by the Data Subject. This data could be composed of personal data, 
credentials, certificates, and other information that should be used during the inter- 
action with the Data Controller application. It also contains the policy files repre- 
senting his privacy preferences. 

Interface and communication: This interface represents a communication inter- 
face with the Data Controller implementing the message exchange protocol. 
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20.2.1.2 Data Controller 


Policy engine: This component is the same as the one described in the Data Subject 
section. 

Repository: This repository represents a database that contains all the informa- 
tion collected from the Data Subject during her interaction with the Data Controller. 
This data represents PIIs, credentials, certificates, and other information provided 
by the user. Also, this database contains the privacy policies related to the different 
resources and services that the Data Controller holds. 

Interface and communication: This interface represents a communication inter- 
face with the Data Subject implementing the message exchange protocol. This in- 
terface plays the role of user interface described in the data subject section, in case 
of downstream interaction between the Data Controller and a Third Party. 


20.2.1.3. Third Party 


All the components supported by these actors are the same as those described in the 
Data Controller section. This is due to the fact that the Third Party plays the role of 
a Data Controller in case of downstream usage of the data. 


20.2.2 Detailed Architecture 


The entire architecture can be represented by three layers: The first one presents 
the user interface layer. The second, business layer, represents the core of the PPL 
Engine. The last layer represents the persistence layer that is in charge of data per- 
sistence. 


20.2.2.1 Presentation Layer 


The presentation layer is responsible for the display to the end user. The presentation 
layer contains two components: 


e The policy editor: Displays and provides a way to manage all the information 
related to the Data Subject, Data Controller and the Third Party. This informa- 
tion can be the personal data (PHs, the credentials, etc), the privacy policy or 
preference, or the information involved during a transaction between the differ- 
ent entities. This component is not yet integrated to the current version of the 
demonstrator but should be part of the next release. 

e The matching handler: Displays to the user the result of the matching. In the case 
of a mismatch, a set of tools are provided that allows the data subject to manage, 
or make an informed decision about, the mismatch. 
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Fig. 20.2: Detailed architecture. 


The UI layer should be independent from the business layer. For that, an interface 
component is deployed between these two layers in order to provide an abstraction 
level. 


20.2.2.2 Business Layer 


The business layer, which represents the core of the PPL engine, is composed of 
four main elements that implements the new concepts introduced within PrimeLife. 
These components are: 


e Policy Enforcement Point (PEP): This component formats and then dispatches 
the data to the corresponding component according to the state of the execution 
process. The decision made by the PDP is enforced in the PEP, meaning that if 
the PDP decided to provide data or enforce the access of one resource, this data 
or resource is collected, formatted and sent to the receiver through the PEP. 

e Policy Decision Point (PDP): This is the core of the PPL engine where all deci- 
sions are made. It can be broken down into two subcomponents: 
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— Matching engine: This functionality matches between the preferences of the 
Data Subject and the privacy policy of the Data Controller. The matching is 
done to verify that the intentions of the Data Controller in terms of PII usage 
are compliant with the Data Subject’s preferences. 

— Access control engine: This component is in charge of the application of the 
access control rules related to the local resources. It analyses the resource 
query, checks the access control policy of the requested resource and decides 
whether or not the requester satisfies the rules. 


Credential handler: One of the new features introduced in PPL is the support of 
the credential based access control. This feature is implemented by the credential 
handler who manages the collection of credentials held by an entity, selects the 
appropriate credentials in order to generate a cryptographic proof and verifies the 
cryptographic proofs of the claims received from external entities. The credential 
handler component contains the subcomponent Rule Verification; the PPL policy 
contains a description of the credential requirements (for access control), the 
Rule Verification component evaluates whether the claim provided by a user that 
wants to access a resource satisfies the credential based access control rule. 
Obligation handler: This is responsible for handling the obligations that should 
be satisfied by the Data Controller/Third Party. This engine executes two main 
tasks; it sets up the triggers related to the actions required by the privacy prefer- 
ences of the Data Subject, and executes the actions specified by the Data Subject 
whenever it is required. 


The other components of the architecture play a secondary role in the concept 


introduced by the PPL engine: 


Web server: An embedded web server that represents the entry point of the core 
of the PPL Engine. It can be seen as an interface to the PEP. 

Persistence handler: Can be described as an interface between the business layer 
and persistence layer. It encapsulates access to the storage medium business ob- 
jects. It makes transparent to the business layer location and storage model of the 
data it manipulates. In general, this layer is supported by a Persistence Frame- 
work. The defined objects in this layer are generally DAOs (Data Access Object). 
The persistence handler provides management functions to handle the DAOs 
known as CRUD (Create, Retrieve, Update, Delete) methods. The persistence 
handler provides the functions to manage the PIIs and the policies in the differ- 
ent databases. 


20.2.2.3 Persistence layer 


This layer is in charge of storing and persisting all the data involved in the system 
(PIs, policies, preferences...). This layer is composed of two elements: 


PII/Policy store: This database (or other way of persistence, such as LDAP, etc) 
contains all of the information related to the PIIs and their related policies. 
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e Credential store: This database contains all the credentials and certified infor- 
mation held by an entity. Access to this store is exclusively dedicated to the 
credential handler component. 


20.3 PPL Policy Language Structure 
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Fig. 20.3: Policy Language Structure. 


The PPL language extends XACML 2.0 with a number of privacy-enhancing and 
credential based features. The PPL language is intended to be used: 


e By the Data Controller to specify the access restrictions to the resources that she 
offers; 

e By the Data Subject to specify access restrictions to her personal information, and 
how she wants her information to be treated by the Data Controller afterwards; 

e By the Data Controller to specify how “implicitly” collected personal informa- 
tion (i.e., information that is revealed by the mere act of communicating, such as 
IP address, connection time, etc.) will be treated; 

e By the Data Subject to specify how she wants this implicit information to be 
treated. 


For that, we maintain the overall structure of the XACML language and we intro- 
duce a number of new elements to support the advanced features that our language 
aims to offer. 
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20.3.1 PolicySets, Policy and Rules 


As in XACML, the main elements of our language are PolicySet, Policy and Rules. 
These elements are then inherited from XACML. Each Rule element has an effect, 
either Permit or Deny, that indicates the consequence when all conditions that were 
stated in the Rule have been satisfied. Rules are grouped together in Policy. When a 
Policy is evaluated, the rule combining algorithm! of the policy (as stated in an XML 
attribute of the Policy) defines how the effects of the applicable rules are combined 
to determine the effect of the Policy. Policies, in their turn, are grouped together 
in PolicySet; the effect of a PolicySet is determined by the effects of the contained 
Policies and the stated policy combining algorithm. Finally, different PolicySet can 
be further grouped together in parent PolicySets. 
The PolicySet, Policy and Rule elements are composed by different elements: 


e A Target (plain XACML: Target), which describes the Resource, the Subject, and 
the environment variables for which this PolicySet, Policy or Rule are applicable; 

e CredentialRequirements, describing the credentials that need to be presented in 
order to be granted access to the resource; this element is not defined in the 
original XACML language; 

e ProvisionalActions, describing which actions (e.g., revealing attributes or signing 
statements) have to be performed by the requester in order to be granted access; 
this element is not defined in the original XACML language; 

e XACML: Condition, specifying further restrictions on the applicability of the rule 
beyond those specified in the Target and the CredentialRequirements; 

e DataHandlingPolicy, describing how the information that needs to be revealed 
to satisfy this rule will be treated afterwards; this element is not defined in the 
original XACML language; 

e DataHandlingPreferences, describing how the information contained in the re- 
source that is protected by this rule has to be treated; this element is not defined 
in the original XACML language. 


20.3.2 Credential Requirements 


The policy language that we present is geared towards enabling technology-independent, 
user-centric and privacy-friendly access control on the basis of certified credentials. 
By a credential we mean an authenticated statement about attribute values made 
by an issuer, where the statement is independent from a concrete mechanism for 
ensuring authenticity. The statement made by the issuer is meant to affirm qualifi- 
cation. As credentials are not directly supported in the traditional policy languages, 


' Rule combining algorithms provides the final Authorisation decision by combining the effects 
of all the rules in a policy, as: Deny-overrides: If one of the rules evaluates to Deny, then the final 
authorisation decision is Deny. Permit-overrides: If any rule evaluates to Permit, then the final 
authorisation decision is also Permit. 
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we extended the XACML Rule element such that credentials are the basic unit for 
reasoning about access control. 

Each Rule can contain a CredentialRequirements element to specify the creden- 
tials that have to be presented in order to satisfy the Rule. The CredentialRequire- 
ments element contains a separate Credential element for each credential that needs 
to be presented. Each Credential element contains a unique identifier Credentialld 
that is used to refer to the credential from elsewhere in the Rule. 

The CredentialRequirements element can also occur in parent Policy and Poli- 
cySet elements. They follow a typical distributive semantics; namely, one should 
treat the CredentialRequirements element of a Rule as if it contained all Credential 
elements specified within the rule itself, as well as those specified within all parent 
Policy and PolicySet elements. 


20.3.3 Provisional Actions 


The ProvisionalActions element is used to specify the provisional actions that a 
requester must perform before being granted access to the resource. Currently sup- 
ported actions include revealing of attributes (to the Data Controller or to a Third 
Party) optionally under handling policy and credential proof, signing a statement, 
and so-called “spending” of credentials, which allows for placing restrictions on 
the number of times that the same credential is used to obtain access. Each action 
is described in a ProvisionalAction element; the language has to be extensible so 
that new types of ProvisionalActions can easily be added later on, and can refer to 
DataHandlingPolicy and Credential elements. 


20.3.4 Data Handling Policies 


The main purpose of the data handling policies is for the Data Controller to express 
what will happen to the information provided by the Data Subject during an access 
request. The provisional action to reveal an attribute value, for example, therefore 
contains an optional reference to the applicable DataHandlingPolicy. Each Rule, 
Policy, or PolicySet element can contain a number of DataHandlingPolicy. A Data- 
Handling Policy can be referred to from anywhere in the rule by its unique Policyld 
identifier. 

A DataHandlingPolicy consists of a set of Authorisations that the Data Con- 
troller wants to obtain on the collected information, and a set of Obligations, that 
she promises to adhere to. 

Before the Data Subject reveals her information, these Authorisations and Obli- 
gation are matched against the Data Subject’s DataHandlingPreference to see 
whether a matching StickyPolicy can be agreed upon. 
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20.3.5 Data Handling Preferences 


The data DataHandlingPreference element defines how the information obtained 
from the resource protected by the Data Subject is to be treated after access is 
granted. The preferences are expressed by means of a set of Authorisations and 
Obligations, just like DataHandlingPolicy element. When access to the resource is 
requested, the DataHandlingPreference element has to be matched against a pro- 
posed DataHandling Policy to derive the applicable StickyPolicy - if a match can be 
found. 

An important difference between DataHandlingPreference and DataHandling- 
Policy is the resource that they pertain to. The DataHandlingPreference always 
describes how the resource protected by the Data Subject itself has to be treated 
after being collected.Whereas DataHandlingPolicy is used to communicate that a 
requester will have to reveal information in order to be granted access to the re- 
source. 

The main use of DataHandlingPreference is for a Data Subject to specify how 
she wants her PII to be treated by a Data Controller, i.e., which Authorisations she 
grants to the Data Controller with respect to her personal data, and which Obliga- 
tions the Data Controller will have to adhere to. 

Optionally, if the DataHandlingPreference contains a AuthorisationDownstrea- 
mUsage, this can be interpreted by optionally including a Policy specifying the 
downstream access control policy, i.e., the access control policy that has to be en- 
forced on the downstream Data Controllers. 


20.3.6 Sticky Policies 


The StickyPolicy element is associated to a resource, meaning the agreed-upon sets 
of granted Authorisations and promised Obligations with respect to a resource. The 
StickyPolicy is usually the result of an automated matching procedure between the 
Data Subject’s DataHandlingPreference and the Data Controller’s DataHandling- 
Policy. 

The main difference between the StickyPolicy and the DataHandlingPreferences 
is that the former contains the Authorisations and Obligations that the policy-hosting 
entity itself has to adhere to, while the latter contains Authorisations and Obligations 
that an eventual recipient has to adhere to. Typically, a Data Subject will not impose 
on his or her self any Authorisation or Obligation concerning her own PII, so her 
policy will not contain a StickyPolicy element. The Data Controller, on the other 
hand, will describe in the StickyPolicy the Authorisation and Obligation that she, 
herself, has to adhere to, while the DataHandlingPreferences contain those that a 
Downstream Data Controller has to adhere to. Usually, the Downstream Data Con- 
troller (Third Party) will be subject to the same or stronger restrictions than the Data 
Controller herself, meaning that the policy specified in the DataHandlingPrefer- 
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ences Will usually be at most as permissive as the policy specified in the StickyPol- 
icy. 


20.3.7 Obligations 


The elements DataHandling Policy, DataHandlingP references, and StickyPolicy con- 
tain a set of Obligation elements. An Obligation is defined as: “A promise made by 
a Data Controller to a Data Subject in relation to the handling of her PII. The Data 
Controller is expected to fulfill the promise by executing and/or preventing a specific 
action after a particular event, e.g., time, and optionally under certain conditions.” 
As defined previously, an obligation is often defined as Event-Condition-Action: 


On Event If Condition Do Action. 


For facilitating the comparison of obligations, we consider Triggers as events fil- 
tered by conditions. In other words, we replace the notions of events and conditions 
by Trigger. The Triggers are events that are considered by an obligation and can be 
seen as the set of events that result in actions. 


Obligations in PPL language, consists of a set of Obligation elements. This latter 
defines a set of Triggers describing the events that trigger the obligation, and the 
related Action that has to be performed. 


The reason why we did not choose to use the standard XACML: Obligations el- 
ement to specify the obligations (in which we embed the data handling policies 
or preferences), is that XACML Obligations can only be used to specify obligations 
that the PEP has to adhere to when an access request occurs. This cannot be used for 
our data handling policies, since the latter pertain to information that the requester 
will have to reveal in order to obtain access, rather than to the resource being pro- 
tected. This XACML element cannot be reused for our data handling preferences 
either, since the latter specifies obligations that the recipient of the resource has to 
adhere to, rather than the PEP that is protecting access to the resource. In other 
words, by populating the XACML: Obligations element that protects her personal 
data, a Data Subject would impose obligations that she herself has to adhere to each 
time a Data Controller requests access to the personal data, rather than imposing 
obligations on the Data Controller. 


Since obligations triggered by access requests are only a small subclass of the 
obligations that we consider here, we chose to leave the storage and enforcement of 
obligations entirely up to the obligation Engine, and let the PEP simply signal the 
obligation Engine each time an access request occurs. 
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20.3.8 Authorisations 


DataHandlingPolicy, DataHandlingPreference, and StickyPolicies contain, apart 
from the set of Obligations described above, also a set of Authorisations. While 
obligations specify actions that the Data Controller is required to perform on the 
transmitted information, authorisations specify actions that it is allowed to perform. 
Similarly to what we did for obligations, we recognise that it is impossible to define 
an exhaustive list of authorisations that covers all needs that may ever arise in the 
real world. Rather, we define a generic, user-extensible structure for authorisations 
so that new, possibly industry-specific authorisation vocabularies can be added later 
on. We do provide, however, a basic authorisation vocabulary for using data for cer- 
tain purposes and for downstream access control (to forward the information to third 
parties), and we describe how these authorisations can be efficiently matched via a 
general strategy. 


e Authorisation Purposes: The first concrete authorisation type that we define is 
the authorisation to use information for a particular set of purposes. The AuthUse- 
ForPurpose elements are referred to by standard URIs specified in agreed-upon 
vocabularies of usage purposes. These vocabularies of URIs may be organised as 
flat lists or as hierarchical ontologies. 

e Authorisation for downstream usage: Called AuthDownStreamUsage, is the 
second concrete authorisation type that we define as the authorisation to forward 
the information to third parties, so-called downstream Data Controllers. Option- 
ally, this authorisation enables the Data Subject to specify the access control pol- 
icy under which the information will be made available, i.e., the minimal access 
control policy that the (primary) data controller has to enforce when sharing the 
information with downstream Data Controllers. 


20.4 PPL Engine Data Model 


In this section, we define all the implementation details of the PPL engine. We are 
using class and package diagrams to describe in detail how the different components 
of the engine are implemented. In order to facilitate the manipulation of the different 
elements of the XML policy, we decided to map all these elements into Java classes. 
These classes are then stored into the persistence database and called as soon as we 
need to read, modify or generate a new policy. For example, if we want to generate a 
sticky policy, after matching a privacy policy and a preference, we call all the classes 
related to the elements of this sticky policy and we generate an XML file. This 
method is less complex than the selection and assembling of pure XML elements. 
In this chapter, all the PPL language elements are describes as Java classes. 

In the following sections, we define the PPL prefix of one element to define that 
it as a PrimeLife element, and the XACML prefix for the XACML elements. 
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obligation 


Fig. 20.4: Data model package diagram representing the dependencies between the 
packages. 


20.4.1 Package pii 


The package pii contains the data class to represent the PII. It is constituted by the 
PllType class. The PPIType is used to represent the Personally Identifiable Informa- 
tion (PIT) in a simple way. 

This class is composed of: 


e The AttributeName element, describing the name identifier of the PII, for ex- 
ample, http: //www.w3.org/2006/vcard/ns\#email which indicates 
the email PH, or also http://www.fgov.be/eID/address to indicates the address in- 
formation; 

e The AttributeValue element, describing the value of the PII, for example if we 
consider the previous AftributeName examples, we can have mail@mail.com 
as a value corresponding to the http: //www.w3.org/2006/vcard/ns\ 
#email AttributeValue; 

e CreationDate and DateModification, describing extra date information to the 
user. 
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20.4.2 Package policy.Impl 


This package contains all the data classes to represent the skeleton of the language 
data structure. 


authorizationsSet :AuthorizationsSetType ‘authorization : List 
‘obligationsSet : ObligationsSet matching : Boolean 
hjid : Long mismatchid : Object 


hjid : Long 
‘CommonDHPSPType() authorizationitems : List 


AuthorizationsSetType() 
Sa 
|__| DataHandlingPreferencesType() : 


DataHandlingPolicy Type 


PolicyType 


dataHandlingPolicy :List 
dataHandlingPreferences : DataHandlingPreferencesType| 
stickyPolicy ; StickyPolicy Type 
credentialRequirements : CredentialRequirementsType 
provisionalActions : ProvisionalActionsType 


PolicySetType 


‘AuthorizationType() 


|AuthzUseForPurpose 


‘AuthzUseForPurpose() 


AuthzDownstreamUsage 


provisionalActions : ProvisionalActionsType 
RuleType() 


ProvisionalActionType() 


Fig. 20.5: Simplified policy data model class diagram. 


At the high level of the tree structure, we have the PPL PolicySetType and 
PPL PolicyType elements that successively extend the XACML PolicySet and the 
XACML Policy classes. The latter are present in the XACML package, and the 
former implements the PPLEvaluatable interface. We used the PPLEvaluatable in- 
terface to provide a generic way to define a Policy, because a Policy can be defined 
by either the PolicySetType class or by the PolicyType class. 

The PolicySetType class is a top level element. It is an aggregation of other Pol- 
icySetsTypes and PolicyTypes. And the PolicyType class is an aggregation of other 
PolicyTypes and RuleTypes elements. 

The PolicySetType, PolicyType and RuleType classes are composed of a different 
class; CommonDHPSPType, which is a generic class for the DataHandlingPolicy, 
DataHandlingPreferences and StickyPolicy classes, CredentialRequirements class 
and ProvisionalActions class. 
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As the DataHandlingPolicyType, the DataHandlingPreferencesType and the Stick- 
yPolicyType classes represent the same data structure, and they are only different 
from the interpretation meaning, they extends the generic class, CommonDHPSP- 


Type. 


The CommonDHPSPType consists of a set of authorisations, defined by the Au- 
thorisationsSetType class, and a set of obligations, expressed in the ObligationsSet 
class (defined in the next section). 


The AuthorisationsSetType class is composed of a set of authorisations defined 
by the abstract class AuthorisationType. This is due to the fact that our language 
supports an extensible authorisation vocabulary, but we predefine two concrete au- 
thorisation types here. The first is the authorisation to use the information for a list of 
purposes, enumerated inside the AuthzUseForPurpose class. The second predefined 
authorisation type, AuthzDownstreamUsage, contains a Boolean attribute indicating 
whether downstream usage is allowed or not in association with a PolicyType at- 
tribute that represents the policy preferences for the third party. 


The ProvisionalActionsType class is composed of a set of ProvisionalActionType. 
This latter describes a single provisional action that needs to be fulfilled in order to 
satisfy a rule. The ProvisionalActionType class contains the Action/d attribute and a 
set of XACML Attribute Value. The ActionId attribute represents the identifier (URI) 
of the action to be performed. The set of the XACML Attribute Value represents ar- 
guments of the action, which may include other functions. The semantics of the 
argument depend on the particular action being performed. Some actions are de- 
fined: 


e http://www.primelife.eu/Reveal: This action requires the Data Sub- 
ject to reveal an attribute. The attribute could be part of one of her credentials, 
or could be a self stated, uncertified attribute. The action takes one or two argu- 
ments of data-type http: //www.w3.org/2001/XMLSchema\#anyURI. 
The first (mandatory) argument is the URI of the attribute to be revealed. The 
second (optional) argument is a URI referring to the credential identifier (Cre- 
dentialID) of the credential that contains the attribute. 

e http://www.primelife.eu/RevealUnderDHP: This action requires the 
Data Subject to reveal an attribute while specifying the data handling policy that 
will be applied to the attribute after it is revealed. The attribute could be part 
of one of her credentials, or could be a self-stated, uncertified attribute. The ac- 
tion takes two or three arguments of data-type http: //www.w3.org/2001/ 
XMLSchema\#anyURI. The first (mandatory) argument is the URI of the at- 
tribute to be revealed. The second (mandatory) argument is a URI referring to 
the data handling policy under which the attribute has to be revealed. The third 
(optional) argument is a URI referring to the credential identifier of the credential 
that contains the attribute 

e http://www.primelife.eu/RevealTo: This action requires the requester 
to reveal an attribute to an external third party. The attribute could be part of 
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one of her credentials, or could be a self-stated, uncertified attribute. The ac- 
tion takes two or three arguments of data-type http: //www.w3.org/2001/ 
XMLSchema\#anyURI. The first (mandatory) argument is the URI of the at- 
tribute to be revealed. The second (mandatory) argument is the URI defining the 
third party to whom the attribute will be revealed. The third (optional) argument 
is a URI referring to the credential identifier of the credential that contains the 
attribute. 

e http://www.primelife.eu/RevealToUnderDHP: This action requires 
the Data Subject to reveal an attribute to an external third party while spec- 
ifying the data handling policy that will be applied to the attribute after it is 
revealed. The attribute could be part of one of her credentials, or could be a 
self-stated, uncertified attribute. The action takes three or four arguments of 
data-type http: //www.w3.org/2001/XMLSchema\#anyURI. The first 
(mandatory) argument is the URI of the attribute to be revealed. The second 
(mandatory) argument is the URI defining the third party to whom the attribute 
will be revealed. The third (mandatory) argument is a URI referring to the data 
handling policy under which the attribute has to be revealed. The fourth (op- 
tional) argument is a URI referring to the credential identifier of the credential 
that contains the attribute. 

e http://www.primelife.eu/Sign: This action requires the requester to 
sign a statement before accessing the resource. How the signature is implemented 
depends on the underlying technology, but carries the semantics that a verifier can 
check later that a particular Data Subject satisfying the policy explicitly agreed to 
the statement. The action takes a single argument of data-type http://www. 
w3.org/2001/XMLSchema\#string describing the statement that needs 
to be signed. 

e http://www.primelife.eu/Spend: This action requires the requester 
to “spend” one of her credentials, thereby imposing restrictions on now many 
times the same credential can be used in an access request. The action takes 
four mandatory arguments. The first is of data-type http: //www.w3.org/ 
2001/XMLSchema\#anyURI and contains the Credentialld of the creden- 
tial that has to be spent. The second and third arguments are of type http: 
//www.w3.org/2001/XMLSchema\#integer. The second argument is 
the number of units that have to be spent for this access; the latter is the spending 
limit, i.e., the maximum number of units that can be spent with this credential on 
the same scope. The fourth argument is of data-type http: //www.w3.org/ 
2001/XMLSchema\#string and defines the scope on which the credential 
has to be spent. 


20.4.3 Package Credential 


The CredentialRequirementsType class is composed of a set of credentials and con- 
ditions. Each individual credential is a condition within a credential. 
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The PPL ConditionType class contains an abstract attribute of type XACML Ex- 
pression. This latter class provides a way to define an obligation within a credential. 

The CredentialType class is used to declare a credential that has to be held by 
an access requester. It contains a Credentialld attribute that identifies the credential 
element and is in relation of 1..0 with the UndisclosedExpressionType class. This 
class acts as a placeholder to indicate that a credential condition was omitted due to 
policy sanitization. 


Also, the CredentialType class is related to the AttributeMatchAnyOfType class. 
This class can specify conditions directly within the CredentialType class. 


The AttributeMatchAnyOfType class is used for matching a given attribute with 
a list of values, whereby for every list element an individual matching algorithm is 
used. 

Although in principle any attribute can be matched, the AttributeMatchAnyOf 
construction is particularly useful for providing lists of accepted credential types or 
issuers. Clearly, if no credential types are explicitly specified, then any credential 
type that contains the necessary attributes can be used to satisfy the policy. If no 
issuers are satisfied, then credentials by any issuer are accepted. 


The element AttributeMatchAnyOfType class contains the following attributes; 
AttributeId, which determines the name of the attribute in this credential that is 
matched against the list of values, disclose attribute, the type of policy disclosure 
used for this element when this policy is sent to the Data Subject. Possible values 
are “yes,” “no,” and “attributes-only.” When the attribute is omitted, the default value 
“ves” is assumed. 


When set to “yes,” this AttributeMatchAnyOf element is sent unmodified to the 
Data Subject. When set to “no,” this AttributeMatchAnyOf element is sanitised by 
means of the following substitutions: 


e The value of Attributeld is replaced with “undisclosed” 
e Each MatchValue child element within this AttributeMatchAnyOf element is re- 
placed with an UndisclosedExpression element. 


When set to “attributes-only,” then only the latter substitution is performed, i.e., 
all MatchValue child elements are replaced with an UndisclosedExpression element. 
(See the section policy sanitisation). 


The MatchValueType class defines a literal value against which the given attribute 
(specified with Attributeld in the parent AttributeMatchAnyOf element) is matched, 
as well as the matching algorithm that is used. The class contains the following 
attributes: 


e Matchld that indicates the name of the matching algorithm that is used to match 
the attribute with the literal. 
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e DataType attribute of the literal value against which the attribute will be matched 
e Disclose attribute that defines the type of policy disclosure used for this element 
when this policy is sent to the Data Subject. Possible values are “‘yes” and “no.” 


20.4.4 Package Obligations 


TriggerPersonalDataSent 
id : String 
maxDelay : Duration 
TriggerPersonalDataSent() 
TriggerDataLost 


maxDelay : Duration 
TriggerDataLost() 


TriggerPersonalDataDeleted 
maxDelay : Duration 


ObligationsSet 


obligation : List TriggerPeriodic 
matching :boolean 


infinit :boolean start : DateAndTime 
mismatchld : String end : DateAndTime 
elementid : String maxDelay : Duration 
hjid : Long period : Duration 


ObligationsSet() TriggerPeriodic() 
getObligation() : List 


Obligation Trigger 


TriggerPersonalDataDeleted() 
triggersSet : List name : String 
action : JAXBElement matching : boolean 


: TriggerAtTime 
matching :boolean mismatehid : String }e4 . 
infinit :boolean elementtd : String start : DateAndTime 
mismatchld : String hjid : Long maxDelay : Duration 
elementid : Strin: 5 
hjid : Long 7 TriggerAtTime() 
Obligation() Trigger) 

TriggerDataSubjectAccess 


TriggersSet() : List 
TriggerDataSubjectAccess() 


TriggerOnViolation 
TriggersSet 


maxDelay : Duration 
name : String trigger : List obligation : List 
matching : boolean name: String 
mismatchld : String matching : boolean TriggerOnViolation() 
elementid : String mismatchld : String 
hjid : Long elementid : String 


hjid : Long i TriggerPersonalDataAccessedForPurpose 

triggersitems : List 

TiggersSet) pees ae | 
TriggerPersonalDataAccessedForPurpose() 


ActionAnonymizePersonalData 


ActionAnonymizePersona!Data() 


ActionNotifyDataSubject 


media : String 
address : String 


ActionNotifyDataSubject() 


ActionSecureLog ActionDeletePersonalData 


ActionLog 


ActionLog() 


ActionSecureLog() | | ActionDeletePersonalData() 


Fig. 20.6: Simplified obligation data model class diagram. 


The obligation package contains all of the data classes to define obligations. The 
main class in this package is ObligationsSet class, which is composed of a set of 
Obligation objects. Each Obligation contains one TriggerSet, which in turn contains 
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a set of Trigger objects describing the events that trigger the obligation, and one 
Action element defining the action that has to be performed. 


There are different types of triggers that extend the abstract class Trigger, and 
different types of actions that extend the abstract class Action. The language and the 
design for defining obligation may be slightly different to express obligations in the 
Data Controller’s privacy policy, in the Data Subject’s privacy preferences, and in 
sticky policies: 


e The Data Subject’s privacy preferences specify “required obligations,’ i.e. what 
the Data Subject requires in terms of obligations to provide a given piece of 
personal data to a given Data Controller. 

e The Data Controller’s privacy policy specifies “proposed obligations,” i.e. what 
the Data Controller is willing (and able) to enforce in terms of obligations for a 
given collected data. 

e The sticky policy specifies “committed obligations,” i.e. the obligations the Data 
Subject and the Data Controller agreed upon and that must be enforced by the 
Data Controller. 


Here is a brief description of some of the common triggers and actions: 


e Trigger at Time: A time-based trigger that occurs only once between start and 
start + maxDelay. 

e Trigger Periodic: A time-based trigger that occurs multiple times on a periodic 
basis between start and end. 

e Trigger Personal Data Accessed for Purpose: An event-based trigger that occurs 
each time the personal data associated with the obligation is accessed for one of 
the specified purposes. 

e Trigger Personal Data Sent: An event-based trigger that occurs when the PII 
associated with the obligation is shared with a third party (downstream Data 
Controller). 

e Action DeletePersonalData: This action deletes a specific piece of information, 
and is intended for handling data retention. 

e Action NotifyDataSubject: This action notifies the Data Subject when triggered, 
i.e. send the trigger information to the Data Subject. 

e Action Log: This action logs an event, e.g., write in a trace file the trigger infor- 
mation. 


20.4.5 Package StickyPolicy 


The StickyPolicy package contains all of the data classes to define the result of the 
matching process. This package is composed of three sub-packages. 
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20.4.5.1 Sub-package impl 


The sub-package imp of the sticky policy package contains the data structure skele- 
ton of the matching process result. The StickyPolicy class represents the main com- 
ponent. It contains matching attributes, which indicate if the final matching is true 
(there is a match) or not (there is a mismatch), and is composed by different Atr- 
tributeType elements. 

The AttributeType class represents the data type of a matching PII. So, if in the 
policy we have three revealing actions, under a particular data handling policy, of 
three PII, we will have three AttributeType objects within the StickyPolicy object. 
This class contains, as for the StickyPolicy class: 


e A Matching attribute to indicate if the PII has a matching or not. 
e An attributeURI element to represent the attribute name value of the PII, and 
composed of: 


— An authorisationsSet element, which represents the authorisationSet Sticky- 
Policy of the matching; 

— An obligationsSet element, which represents the obligationsSet Sticky Policy 
of the matching process, and a mismatched element in case there is a mismatch 
as a result of the matching. 


e The MismatchesType class contains two types; authorisationsMismatch and obli- 
gationsMismatch. These elements are only present if a mismatch of the corre- 
sponding type occurs. Each MismatchType is defined in a separate package, be- 
cause the PPL language is extensible, and the definition of the mismatch depends 
of the DataHandling type. 


20.4.5.2 Sub-package Authorisations Mismatch 


The sub-package Authorisation represents the data classes of the authorisations mis- 
match. The AuthorisationsMismatchType class represents the main class. It contains 
a mismatchld attribute that is used to be referred within the AuthorisationsSet ele- 
ment, and composed of either the AuthorisationsSetMismatchType or with the two 
AuthzUseForPurposeType and AuthzDownStreamUsageType together (or only one 
of the elements). 

The Mismatch class is defined to help the engine and the UI to display the mis- 
matching elements and permit to the user to make a decision without being obliged 
to come back to his preferences and compare it with the sticky policy. 

We distinguish the authorisation and the obligation mismatches using the two 
classes AuthzUseForPurposeType and AuthzDownStreamUsageType. In some cases, 
we can have either an authorisation use for purpose mismatch, or an authorisation 
downstream usage mismatch, or both together. To notify and describe the mismatch, 
we use the same concept mentioned above, we indicate the policy values (which rep- 
resent the Data Controller values) and the preferences values (which represent the 
Data Subject values). 
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20.4.5.3 Sub-package Obligations Mismatch 


To notify and describe the mismatch, we use the same concept mentioned previously 
in the authorisation mismatch. We use the matching attribute to indicate whether a 
match occurs or not, and we indicate the policy values (which represent the Data 
Controller values) and the preferences values (which represent the Data Subject 
values), in case of matching. 


20.5 Conclusion 


In this chapter, we document and describe how we implemented the PPL engine in 
charge of interpreting and enforcing policies and preferences defined by the Data 
Controllers and the Data Subjects. We detailed the symmetric architecture support- 
ing the PPL engine by explaining how each component is working and interacting 
with the other one. This chapter describes mainly the implementation options chosen 
to develop the engine according to the requirement defined in the previous chapters. 


References Part IV 


[ABD* 10] Claudio A. Ardagna, Laurent Bussard, Sabrina De Capitani Di, Gregory Neven, Ste- 
fano Paraboschi, Eros Pedrini, Stefan Preiss, Dave Raggett, Pierangela Samarati, Slim 
Trabelsi, and Mario Verdicchio. Primelife policy language. 2010. 

[ACDS08] C.A. Ardagna, M. Cremonini, S. De Capitani di Vimercati, and P. Samarati. A privacy- 
aware access control system. Journal of Computer Security (JCS), 16(4):369-397, 2008. 

[ACK* 10] Claudio Agostino Ardagna, Jan Camenisch, Markulf Kohlweiss, Ronald Leenes, Gre- 
gory Neven, Bart Priem, Pierangela Samarati, Dieter Sommer, and Mario Verdicchio. Ex- 
ploiting cryptography for privacy-enhanced access control: A result of the PRIME Project. 
Journal of Computer Security, 18(1):123—160, 2010. 

[ADF* 10a] C.A. Ardagna, S. De Capitani di Vimercati, S. Foresti, S. Paraboschi, and P. Samarati. 
Minimizing disclosure of private information in credential-based interactions: A graph- 
based approach. In Proc. of the 2nd IEEE International Conference on Information Pri- 
vacy, Security, Risk and Trust (PASSAT 2010), Minneapolis, MN, USA, August 2010. 

[ADF*10b] C.A. Ardagna, S. De Capitani di Vimercati, S. Foresti, S. Paraboschi, and P. Samarati. 
Supporting privacy preferences in credential-based interactions. In Proc. of the ACM 
Workshop on Privacy in the Electronic Society (WPES 2010), Chicago, IL, USA, October 
2010. 

[ADN*10] Claudio A. Ardagna, Sabrina De Capitani di Vimercati, Gregory Neven, Stefano 
Paraboschi, Franz-Stefan Preiss, Pierangela Samarati, and Mario Verdicchio. Enabling 
privacy-preserving credential-based access control with XACML and SAML. In 10th 
IEEE International Conference on Computer and Information Technology (CIT 2010), 
pages 1090-1095. IEEE Computer Society, 2010. 

[ADP* 10] Claudio A. Ardagna, Sabrina De Capitani di Vimercati, Stefano Paraboschi, Eros 
Pedrini, Pierangela Samarati, and Mario Verdicchio. Expressive and deployable access 
control in open web service applications. JEEE Transaction on Services Computing, 2010. 
to appear. 

[AHKS02] P. Ashley, S. Hada, G. Karjoth, and M. Schunter. E-P3P privacy policies and privacy 
authorization. In Proc. of the ACM Workshop on Privacy in the Electronic Society (WPES 
2002), Washington, DC, USA, November 2002. 

[APE05] APEC. Chapter ii and viii of the apec_ privcacy framework. 
http://www.apec.org/apec/newsmedia/ f actsheets/apecprivacy f rame- 
work.MedialibDownload.v1(21.109), 2005. 

[BCC05] E. F. Brickell, J. Camenisch, and L. Chen. Direct anonymous attestation. In Proc. 
of the 11th ACM Conference on Computer and Communications Security (CCS 2005), 
Alexandria, VA, USA, November 2005. 

[BCS05] M. Backes, J. Camenisch, and D. Sommer. Anonymous yet accountable access con- 
trol. In Proc. of the ACM Workshop on Privacy in the Electronic Society (WPES 2005), 
Alexandria, VA, USA, November 2005. 


375 


376 References Part IV 


[BDDSO1] P. Bonatti, E. Damiani, S. De Capitani di Vimercati, and P. Samarati. An access control 
model for data archives. In Proc. of the 16th International Conference on Information 
Security, Paris, France, June 2001. 

[BFIK98] M. Blaze, J. Feigenbaum, J. loannidis, and A.D. Keromytis. The role of trust manage- 
ment in distributed systems security. Secure Internet Programming: Issues in Distributed 
and Mobile Object Systems, 1998. 

[BFL96] M. Blaze, J. Feigenbaum, and J. Lacy. Decentralized trust management. In Proc. of 
IEEE Symposium on Security and Privacy, Oakland, CA, USA, May 1996. 

[BMBO09] Moritz Y. Becker, Alexander Malkis, and Laurent Bussard. MSR-TR-2009-128: A 

framework for privacy preferences and data-handling policies. Technical report, Microsoft 

Research, September 2009. 

[Bro08] K. Brown. ”the infocard identity revolution”. http://technet.microsoft.com/enus/ maga- 

zine/cc 160966(printer).aspx, 2008. 

[BS02] P. Bonatti and P. Samarati. A unified framework for regulating access and information 

release on the web. Journal of Computer Security (JCS), 10(3):241-272, 2002. 

[Bun08] Deutscher Bundestag. Bundesdatenschutzgesetz in der Fassung der Bekanntmachung 
vom 14. Januar 2003 (BGBI. I S. 66), das zuletzt durch Artikel 1 des Gesetzes vom 14. 
August 2009 (BGBI. I S. 2814) gedndert worden ist. Bundesanzeiger Verlag, 2008. 

[CCKT05] W. Chen, L. Clarke, J. Kurose, and D. Towsley. Optimizing cost-sensitive trust- 
negotiation protocols. In Proc. of the 24th Annual Joint Conference of the IEEE Computer 
and Communications Societies (INFOCOM 2005), Miami, FL, USA, March 2005. 

[CD00] J. Camenisch and I. Damgard. Verifiable encryption, group encryption, and their appli- 
cations to separable group signatures and signature sharing schemes. In Proc. of the 6th 
International Conference on the Theory and Application of Cryptology and Information 
Security (ASIACRYPT 2000), Kyoto, Japan, September 2000. 

[CDFS07a] V. Ciriani, S. De Capitani di Vimercati, S. Foresti, and P. Samarati. k-Anonymity. 
In T. Yu and S. Jajodia, editors, Secure Data Management in Decentralized Systems. 
Springer-Verlag, 2007. 

[CDFS07b] V. Ciriani, S. De Capitani di Vimercati, S. Foresti, and P. Samarati. Microdata protec- 
tion. In T. Yu and S. Jajodia, editors, Secure Data Management in Decentralized Systems. 
Springer-Verlag, 2007. 

[CDFS08] V. Ciriani, S. De Capitani di Vimercati, S. Foresti, and P. Samarati. k-anonymous 
data mining: A survey. In C.C. Aggarwal and P.S. Yu, editors, Privacy-Preserving Data 
Mining: Models and Algorithms. Springer-Verlag, 2008. 

[CFL*97] Y-H. Chu, J. Feigenbaum, B. LaMacchia, P. Resnick, and M. Strauss. Referee: Trust 
management for web applications. Computer Networks and ISDN Systems, 29(8—13):953— 
964, 1997. 

[CGP*08] S. Cimato, M. Gamassi, V. Piuri, R. Sassi, and F. Scotti. Privacy-aware biometrics: 
Design and implementation of a multimodal verification system. In Proc. of the Annual 
Computer Security Applications Conference (ACSAC 2008), Anaheim, CA, USA, Decem- 
ber 2008. 

[Cha85] D. Chaum. Security without identification: Transaction systems to make big brother 
obsolete. Communications of the ACM, 28(10):1030—-1044, October 1985. 

[CLO1] Jan Camenisch and Anna Lysyanskaya. Efficient non-transferable anonymous multi- 
show credential system with optional anonymity revocation. In Birgit Pfitzmann, editor, 
Advances in Cryptology — EUROCRYPT 2001, volume 2045 of LNCS, pages 93-118. 
Springer Verlag, 2001. 

[CL08] David Chadwick and Stijn Lievens. Enforcing “sticky” security policies throughout a 
distributed application application. In / st International Workshop on Middleware Security 
(Midsec2008), 2008. 

[CMN*10] Jan Camenisch, Sebastian Médersheim, Gregory Neven, Franz-Stefan Preiss, and Di- 
eter Sommer. A card requirements language enabling privacy-preserving access control. 
In James B. D. Joshi and Barbara Carminati, editors, 15th ACM Symposium on Access 
Control Models and Technologies (SACMAT 2010), pages 119-128. ACM, 2010. 


References Part IV 377 


{Com95] European Commission. Art. 7 of the data protection directive 95/46/ec, 1995. 

[Con02] ContentGuard. XrML 2.0 Technical Overview. http://www.xrml.org/ 
reference/XrMLTechnicalOverviewV1 . pdf, 2002. 

[Con07a] PRIME — Consortium. HCI _ guidelines. https://www.primepro 
ject.eu/primeproducts/reports/arch/pubdelD06.1. fecwp06.1v1f inal.pd f, 2007. 

[Con07b] PRIME Consortium. PRIME data model. https://www.primeproject. eu/ont/Data- 
model.html, 2007. 

[Cou83] German Supreme Court. Bverfge , 65,1 (volkszadhlung, az.1 bvr 209). Entscheidungen 
des Bundesverfassungsgerichts, 65:1, 1983. 

[Cra02] L.F. Cranor. Web Privacy with P3P. O’ Reilly & Associates, 2002. 

[CSBS] Ryan Calo, Max Senges, Andreas Braendhagen, and Jan Schallabéck. Privicons - privacy 
icons for email usage. http://privicons.org/. 

[CSFt 08] D. Cooper, S. Santesson, S. Farrell, S. Boeyen, R. Housley, and W. Polk. Internet X.509 
Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile. RFC 
5280 (Proposed Standard), May 2008. http://www.ietf.org/rfc/rfc5280. 
txt. 

[CV02] J. Camenisch and E. Van Herreweghen. Design and implementation of the idemix anony- 
mous credential system. In Proc. of the 9th ACM Conference on Computer and Commu- 
nications Security (CCS 2002), Washington, DC, USA, November 2002. 

[DFJSO7] S. De Capitani di Vimercati, S. Foresti, S. Jajodia, and P. Samarati. Access control 
policies and languages. International Journal of Computational Science and Engineering 
(IJCSE), 3(2), 2007. 

[Dir95] Directive 95/46/ec of the european parliament and of the council of 24 october 1995 on 
the protection of individuals with regard to the processing of personal data and on the free 
movement of such data. Official Journal L 281, pages 31-50, 23/11/1995. 

[eXt05] eXtensible Access Control Markup Language (XACML) Version 2.0, Febru- 
ary 2005. http://docs.oasis-open.org/xacml1/2.0/access\ 
—control—xacml=2,0=core=spec—os.. pdt. 

[Facl0] Profile options, 2010. 

[Fis06] J. Fishenden. ’creative commons and its wider potential”. 2006. 

[FK92] D. Ferraiolo and R. Kuhn. Role-based access control. In Proc. of the 15th NIST-NCSC 
National Computer Security Conference, pages 554-563, October 1992. 

[GD06] S. Gevers and B. De Decker. Automating privacy friendly information disclosure. Tech- 
nical Report CW441, K.U. Leuven, Dept. of Computer Science, April 2006. 

[GLM*04] M. Gamassi, M. Lazzaroni, M. Misino, V. Piuri, D. Sana, and F. Scotti. Accuracy 
and performance of biometric systems. In Proc. of the 21st IEEE Instrumentation and 
Measurement Technology Conference (IMTC 2004), Como, Italy, May 2004. 

[GPSS05] M. Gamassi, V. Piuri, D. Sana, and F. Scotti. Robust fingerprint detection for access 
control. In Proc. of RoboCare Workshop 2005, Rome, Italy, May 2005. 

{HBPOS] M. Hilty, D. Basin, and A. Pretschner. On obligations. Lecture Notes in Computer 
Science, 3679:98-117, 2005. 

[HPB*07] Manuel Hilty, Alexander Pretschner, David Basin, Christian Schaefer, and Thomas 
Walter. A policy language for distributed usage control. In Joachim Biskup and Javier 
Lopez, editors, 12th European Symposium on Research in Computer Security (ESORICS 
2007), volume 4734 of LNCS, pages 531-546. Springer-Verlag, 2007. 

[IDE] IDEntity MIXer (IDEMIX). http://www.zurich.ibm.com/security/ 
idemix/. 

[TY05] K. Irwin and T. Yu. Preventing attribute information leakage in automated trust negotia- 
tion. In Proc. of the 12th ACM Conference on Computer and Communications Security 
(CCS 2005), Alexandria, VA, USA, November 2005. 

[KOB08] P. Karger, D. Olmedilla, and W.-T. Balke. Exploiting preferences for minimal credential 
disclosure in policy-driven trust negotiations. In Proc. of the 5th VLDB Workshop on 
Secure Data Management (SDM 2008), Auckland, New Zealand, August 2008. 

[Kun07] Christopher Kuner. European Data Protection Law, volume 2nd Edition. Oxford Uni- 
versity Press, 2007. 


378 References Part IV 


{LGFO00] N. Li, B.N. Grosof, and J. Feigenbaum. A practically implementable and tractable del- 
egation logic. In Proc. of the IEEE Symposium on Security and Privacy, Oakland, CA, 
USA, June 2000. 

{Linl0] Linkedin. “user agreements”. http://www.linkedin.com/static?key=useragreementtrk= 
hbf tuserag, 2010. 

[LMW0S5] N. Li, J.C. Mitchell, and W.H. Winsborough. Beyond proof-of-compliance: Security 
analysis in trust management. Journal of the ACM, 52(3):474—-514, 2005. 

{Luh77] N. Luhmann. Differentiation of society. Canadian Sociological Review, 2:29-53, 1977. 

[LWBW08] A. J. Lee, M. Winslett, J. Basney, and V. Welch. The Traust authorization service. 
ACM Transactions on Information and System Security (TISSEC), 11(1):1-33, February 
2008. 

[Nis98] Helen F. Nissenbaum. Protecting privacy in an information age: The problem of privacy 
in public. Law and Philosophy Vol 17, pp. 559-596, 1998. 

[Nis04] H. F. Nissenbaum. Privacy as contextual integrity. Washington Law Review, Vol. 79, No. 
1, 2004. 

[Nis10] H. F Nissenbaum. Privacy in context: Technology, policy and the integrity of social life. 
Stanford Law Books 65, 2010. 

[NLW05] J. Ni, N. Li, and W.H. Winsborough. Automated trust negotiation using cryptographic 
credentials. In Proc. of the 12th ACM Conference on Computer and Communications 
Security (CCS 2005), Alexandria, VA, USA, November 2005. 

[OASOSa] OASIS. Assertions and protocols for the OASIS security assertion markup lan- 
guage (SAML) v2.0, 2005. Available from: http://docs.oasis—open.org/ 
security/saml/v2.0/saml-core-2.0-os.pdf. 

[OASOSb] OASIS. SAML 2.0 profile of XACML v2.0. OASIS Standard, 2005. 

[OAS08] OASIS. Oasis extensible access control markup language (xacml) tc. http://www.oasis- 
open.org/committees/tchome.php?wgabbrev = xacml, 2008. 

[ODRO02] ODRL. Open Digital Rights Language (ODRL), version 1.1, 2002. 

[OEC80] OECD. Oecd guidelines on the protection of privacy and transborder flows of personal 
data. OECD, 1980. 

[Ope07] OpenID authentication 2.0, December 2007. http: //openid.net/developers/ 
specs/. 

[Par04] Article 29 Working Party. WP 100 Opinion on more harmonised information provisions. 
European Commission, 2004. 

[Pri09a] PrimeLife WP5.1. First research report on research on next generation policies. In 
Pierangela Samarati, editor, PrimeLife Deliverable D5.2.1. PrimeLife, http://www. 
{PrimeLife}.eu/results/documents, 2009. 

[Pri09b] PrimeLife WPS.2. Draft 2nd design for policy languages and protocols. In Dave Raggett, 
editor, PrimeLife Heartbeat H5.3.2. PrimeLife, http://www. {PrimeLife}.eu/ 
results/documents, July 2009. 

{Pri09c] Dynamic Coalition Privacy. Privacy-rights-agreements, 2009. 

{Prilla] PrimeLife WP5.2. Report on research on legal policy mechanisms. In Leif-Erik Holtz 
and Jan Schallabck, editors, PrimeLife Heartbeat H5.2.2. PrimeLife, http://www. 
{PrimeLife}.eu/results/documents, 2011. 

{Prillb] PrimeLife WPS.2. Third research report on research on next generation policies. In 
Sabrina De Capitani di Vimercati and Pierangela Samarati, editors, PrimeLife Deliver- 
able D5.2.3. PrimeLife, http://www. {PrimeLife}.eu/results/documents, 
2011. 

[Pro07] Liberty Alliance Project. Identity governance, 2007. 

[PS04] Jaehong Park and Ravi Sandhu. The UCONABC usage control model. ACM Trans. Inf. 
Syst. Secur., 7(1):128-174, 2004. 

[PSSW08] A. Pretschner, F. Schiitz, C. Schaefer, and T. Walter. Policy evolution in distributed 
usage control. In 4th Intl. Workshop on Security and Trust Management. Elsevier, June 
2008. 


References Part IV 379 


{Run06] M. Rundle. International data protection and digital identity management tools (using 
icons to express user preferences). Presentation at IGF2006 PrivacyWorkshop 1, Athens 
2006, 2006. 

[RZNt 05] T. Ryutov, L. Zhou, C. Neuman, T. Leithead, and K.E. Seamons. Adaptive trust ne- 
gotiation and access control. In Proc. of the 10th ACM Symposium on Access Control 
Models and Technologies, Stockholm, Sweden, June 2005. 

[SCFY96] R.S. Sandhu, E.J. Coyne, H.L. Feinstein, and C. E. Youman. Role-based access control 
models. IEEE Computer, 29(2):38-47, 1996. 

{SDO1] P. Samarati and S. De Capitani di Vimercati. Access control: Policies, models, and mech- 
anisms. In R. Focardi and R. Gorrieri, editors, Foundations of Security Analysis and 
Design, volume 2171 of LNCS. Springer-Verlag, 2001. 

[SWW97] K. E. Seamons, W. Winsborough, and M. Winslett. Internet credential acceptance poli- 
cies. In Proc. of the Workshop on Logic Programming for Internet Applications, Leuven, 
Belgium, July 1997. 

{[SWYO1] K. Seamons, M. Winslett, and T. Yu. Limiting the disclosure of access control poli- 
cies during automated trust negotiation. In Proc. of the Network and Distributed System 
Security Symposium (NDSS 2001), San Diego, CA, USA, April 2001. 

[U-P07] Credentica. U-Prove SDK overview: A Credentica white paper, 2007. http://www. 
credentica.com/files/U-ProveSDKWhitepaper.pdf. 

[W3C01] P3p v1.1. 2001. 

[W3C02] W3C. A P3P preference exchange language 1.0 (APPEL1.0), 2002. 

[W3C03] W3C. Enterprise privacy authorization language (epal 1.2). 
http://www.w3.org/Submission/2003/SUBM-EPAL-20031110/, 2003. 

[W3C06a] W3C. The platform for privacy preferences 1.1 (P3P1.1) specification, 2006. 

[W3C06b] W3C. Web services policy 1.2 - framework — (ws-policy). 
http://www.w3.org/Submission/WS-Policy/, 2006. 

[Wan04] Xin Wang. Mpeg-21 rights expression language: Enabling interoperable digital rights 
management. [EEE MultiMedia, 11(4):84—87, 2004. 

[WCJS97] M. Winslett, N. Ching, V. Jones, and I. Slepchin. Assuring security and privacy for 
digital library transactions on the web: Client and server security policies. In Proc. of the 
4th International Forum on Research and Technology Advances in Digital Libraries (ADL 
’97), Washington, DC, USA, May 1997. 

[Web06] Web services policy framework. http://www.ibm.com/developerworks/ 
webservices/library/specification/ws-polfram/?S\_TACT= 
105AGX04$\&$S\_CMP=LP, March 2006. 

[WSJOO] W. Winsborough, K. E. Seamons, and V. Jones. Automated trust negotiation. In Proc. 

of the DARPA Information Survivability Conference & Exposition (DISCEX 2000), Hilton 

Head Island, SC, USA, January 2000. 

[WWJ04] L. Wang, D. Wijesekera, and S. Jajodia. A logic-based framework for attribute based 

access control. In Proc. of the ACM Workshop on Formal Methods in Security Engineering 

(FMSE 2004), Washington, DC, USA, October 2004. 

[YFAT08] D. Yao, K.B. Frikken, M.J. Atallah, and R. Tamassia. Private information: To reveal or 

not to reveal. ACM Transactions on Information and System Security (TISSEC), 12(1):1- 

27, October 2008. 

[YWO3] T. Yu and M. Winslett. A unified scheme for resource protection in automated trust 
negotiation. In Proc. of the IEEE Symposium on Security and Privacy, Berkeley, CA, 
USA, May 2003. 

[YWS03] T. Yu, M. Winslett, and K.E. Seamons. Supporting structured credentials and sensi- 
tive policies trough interoperable strategies for automated trust. ACM Transactions on 
Information and System Security (TISSEC), 6(1):1-42, February 2003. 


Part V 
Infrastructures for Privacy and Identity 
Management 


Introduction 


The establishment of identity management infrastructures on a global scale is no- 
toriously difficult. Microsoft Passport has failed here, in spite of a hard-to-match 
installed client, allegedly for reasons of trust and privacy [KROO]. Identity manage- 
ment platforms today hardly span domains and only bring together a handful of 
services. Lack of trust in any given organisation, technological problems with secu- 
rity and privacy and compliance and liability issues are only some of the obstacles 
to the establishment of a more global identity management system [BHTBOS]. 
In the course of, e.g., the PRIME project [CLS11], it became clear that: 


Business models for privacy and privacy-enhancing IdM are not trivial. 
Infrastructure aspects are often overlooked, while having a significant impact on 
the adoption of IdM solutions, security and privacy functionality of IdM systems 
in general, and specifically privacy enhancing Identity Management Systems. 


Also, a wide range of protocols and implementations are available in this field, 
making the selection of attractive, useable and applicable components non-trivial. 
Comparable research in this area is not known. A further reason that makes the es- 
tablishment of global identity management systems difficult is the complexity of 
infrastructure aspects, as 


e often every element has some relation to every other element of an infrastructure 
and 

e identity management infrastructures must be interoperable among themselves or 
with existing legacy solutions. 


This part of the book cannot cover all aspects of infrastructure and infrastructure 
research, but concentrates on three most relevant aspects: 


1. Privacy for service oriented architectures (Chapter 21): How can privacy be in- 
tegrated into service oriented architectures, that define more and more aspects of 
internet-based business? 

2. Privacy and Identity Management on Mobile Devices (Chapter 22): Emerging 
Technologies and Future Directions for Innovation. 

3. Privacy by sustainable identity management enablers (Chapter 22.9): To optimise 
sustainability, an economic valuation approach for telco-based identity manage- 
ment enablers is presented. 


Together these chapters address the roles that networks, or network architectures, 
devices, and services play for infrastructures considering the interests of the respec- 
tive stakeholders. 


Chapter 21 
Privacy for Service Oriented Architectures 


Ulrich Pinsdorf, Laurent Bussard, Sebastian Meissner, Jan Schallabock, and Stuart 
Short 


Abstract This chapter describes requirements for privacy in service-oriented ar- 
chitectures. It collects 39 legal and technical requirements, grouped in the five cat- 
egories. These requirements are the starting point for a technical framework that 
brings privacy-enhanced data handling to multi-layered, multi-domain service com- 
positions. We describe an abstract framework that is technology agnostic and allows 
for late adoption also in already existing SOA applications. We describe the general 
building blocks that are necessary on a PII provider’s side and on a PII consumer’s 
side. Finally, we look at the technical implementation of a very common, yet com- 
plicated aspect: the composition of policies when composing information artifacts. 
We describe how the composition of data influences the composition of policies. 


21.1 Introduction 


SOA is a technology-independent architecture concept adhering to the principle of 
service-orientation. It aims at enabling the development and usage of applications 
that are built by combining autonomous, interoperable, discoverable, and potentially 
reusable services. These services jointly fulfill a higher-level operation through 
communication. They fall into the class of distributed systems [CDKO5]. 

One core principle of SOA is the so-called loose coupling of partial services: sin- 
gle services are not permanently bound to each other, rather their binding happens 
only at run-time, enabling a dynamic composition of services [CK05]. Moreover, it 
is even feasible to dynamically bind services hosted in different security domains 
and by different legal entities. We refer to this as “cross-domain service composi- 
tion” [BNP09]. One prominent example of this are services provided via so-called 
“service chains” that comprise several partial services offered by different organisa- 
tions. To facilitate the use of such services, usually one legal entity might serve as 
single point of contact for (potential) customers. Currently, in the Internet era, the 


J. Camenisch et al. (eds.), Privacy and Identity Management for Life, 383 
DOI 10.1007/978-3-642-20317-6_21, © Springer-Verlag Berlin Heidelberg 2011 


384 U. Pinsdorf, L. Bussard, S. Meissner, J. Schallab6ck, S. Short 


locations of organisations providing partial services for one high-level service can 
be widely distributed around the globe. 

In many cases, an SOA might involve the processing of personal data and thus 
pose risks for the privacy of the data subjects concerned. Two specific risks can be 
identified with regard to cross-domain service composition. The first concerns the 
lack of transparency with regard to processing personal data: the involvement of 
different legal entities may lead to the situation where data subjects are no longer 
aware of what data are handled by what entity for what purpose. Data is exchanged 
between service providers and the user can only guess which data goes where. This 
is particularly true if services are invoked dynamically at run-time. In case a high- 
level service delivery involves different organisations, but is exposed by only one of 
them, customers might not even be aware of the involvement of further legal entities 
at all. 

The second risk concerns the issue of data linkability: The use of standardised 
formats and interfaces within an SOA facilitates the linkage of systems and data 
sets. Since SOA method calls transport typed and semantically well-defined data, 
it is easy to use this meta-information to link the transmitted data with other data 
sets. Without the implementation of the appropriate technical and organisational 
measures, organisations could be able to link different sets of personal data and 
generate profiles on data subjects. 

However, the implementation of an SOA also provides some options to achieve 
a high level of privacy for the data subjects concerned. First, each single service 
that forms part of an SOA usually serves a specific purpose such as authentica- 
tion or payment. In combination with privacy-compliant logging techniques, this 
circumstance can be used to implement an automated review of adherence to the 
privacy principle of purpose limitation. Second, tailoring single services to specific 
purposes simplifies the determination of personal data that are really needed for the 
implementation of the respective service. This circumstance facilitates adherence to 
the privacy principles of collection, use, and disclosure limitation as well as obedi- 
ence to the principle of data minimisation. Third, an SOA provides possibilities for 
the implementation of an automated data protection management. This results from 
the fact that, nowadays, technical integration of an SOA typically takes place on the 
basis of web services and XML. As the same holds true for existing and emerging 
standards for an automated data protection management (e.g., P3P, EPAL, XrML), 
these standards could easily be applied within a SOA. 

As in Part IV and especially in Chapter 17, we follow the principle of “down- 
stream” data usage. A entire application is modularised into services. Services are 
specialised in a certain part of an overall workflow and invoke other services as nec- 
essary. This principle usually leads to a chain (or even a tree or graph) of service 
invocations. In terms of privacy, this implies that a service provider whose service 
is a downstream part (those that process data later) of the overall workflow must ad- 
here to policies given by service providers whose services are upstream parts (those 
that process data first) of the workflow (cf. Requirement 27 on page 390). 

In Section 21.2, we summarise 39 requirements for privacy in SOAs, grouped 
into five categories. These requirements set the scope for the abstract privacy frame- 
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work, which we describe in Section 21.3. Finally Section 21.4, finally, looks at the 
technical implementation of a very common, yet complicated aspect: the compo- 
sition of policies when composing information artifacts. We conclude this chapter 
with an outlook and some thoughts on open issues. 


21.2 Requirements for Privacy in SOA 


Service-Oriented Architectures expose new chances and challenges for privacy and 
data protection. The potentially increased distribution of personal data across mul- 
tiple domains makes subject access requests difficult to handle. Which service pro- 
cessed what data? Whom to address for liability issues? At the same time, service 
orientation offers a new approach for the granularity of data processing, allowing 
clearer responsibilities and better auditing. 

This section describes a comprehensive set of requirements for Service-Oriented 
Architectures. If the requirements are applied in the construction of Service-Oriented 
Architectures, legal compliance with privacy legislation is facilitated. Moreover, the 
requirements may provide guidance for the design of privacy enhancing Service- 
Oriented Architectures. They include the privacy risks and opportunities resulting 
from the implementation of Service-Oriented Architectures within one organisation, 
but also across different organisations (cross-domain service composition). 

In particular, such cross-domain service composition involves new privacy risks. 
The involvement of different legal entities may lead to the situation where data 
subjects are no longer aware of what data relating to them are being handled, which 
entity is doing so and for what purpose. Furthermore, the use of standardised formats 
and interfaces across different security domains facilitates the linkage of data sets 
and thus allows for profiling of data subjects. 

On the other hand, SOAs can also provide several options to improve privacy 
and data protection. First, one typical property of any SOA is that each single ser- 
vice could be mapped to specific purposes. This circumstance facilitates the imple- 
mentation of an automated review of adherence to the privacy principle of purpose 
limitation. Second, the tailoring of single services to specific purposes also sim- 
plifies the determination of personal data that are really needed for the respective 
service. It thus eases adherence to the privacy principles of collection, use, and dis- 
closure limitations as well as compliance with the principle of data minimisation. 
Third, as the technical integration of an SOA typically takes place on the basis of 
web services and XML, it provides some possibilities for the implementation of an 
automated data protection management. 

For reasons of brevity the requirements in this subsection are laid out with little or 
no examples given. To better understand the concepts described please cf. [MS09]. 
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21.2.1 Core Policy Requirements 


Policies are used by service providers to describe restrictions on the processing of 
personal data. From a privacy point of view, policies on purpose limitation, non- 
disclosure and data retention period are of major importance. 


No. 1: Policies should be available in an unambiguous formalisation. Thereby, 
the content of policies should be machine interpretable. 


Since policies should be available for automatic processing and comparison with 
user preferences, they have to be available in a machine-interpretable form. To avoid 
misinterpretation of policies and thus reduce legal conflicts, unambiguity in the for- 
malisation is necessary. 


No. 2: /t must be ensured that communicated policies cannot be disputed by 
the ensuring entity. 


Policies must be binding, i.e., the ensuring entity must not be able to dispute their 
existence and exact content. 


No. 3: Policies must be easily accessible to users. Accessing the policies 
should be determined by a clear specification. 


Potential users of a service should be able to see the policies of every service 
provider without trouble. A standardised means of access could be made available, 
but should only require a minimum of personal information about the user exercis- 
ing his/her right of access. 


No. 4: Policies should be presented to users in an easily comprehensible man- 
ner. 


As policies can be very complex, users that do not have detailed legal knowledge 
might not be able to understand and assess them. Thus, policies should be described 
in a manner that is easily comprehensible to the general public. Hereby, the principle 
of transparency is put into effect. 


No. 5: It must be explicitly specified who is responsible for the policy, in- 
cluding a reference to the applicable jurisdiction. This specification must be 
visible to users. 


The specification of responsibility fosters the principle of accountability. 


No. 6: Jt must be explicitly specified what data are covered by a policy. This 
specification must be visible to users. 


A clear link between data and policy is needed, since different services of one 
provider may give varying policies respectively for different parts of one set of data. 
It is advisable to communicate policies for each purpose separately. Thus, the prin- 
ciple of purpose limitation is facilitated. 
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No. 7: Policies should cover all aspects of data processing with regard to 
privacy legislation. 


Policies can be more or less detailed. In order to prevent unnecessary complexity, 
they should not be more detailed than legally/contractually necessary. Taking a lay- 
ered approach may additionally foster the principle of transparency. 


No. 8: Recipients or categories of recipients to which the data will be passed 
on to, must be explicitly specified. This must include a reference to the appli- 
cable jurisdiction for the recipient. 


The specification of (categories of) recipients fosters the principle of transparency. 


No. 9: It should be explicitly specified under what policies data is passed on 
to other parties. 


If personal information is passed down a service chain, the receiving service 
provider is legally bound with regard to what it may do with this data. As this may 
be different from what the originating service may do, it should be reflected in a 
separate policy. 


21.2.2 Privacy Logging Requirements 


No. 10: Logging data should be unambiguously formalised and represented 
in a machine interpretable format. 


If logging takes place in a log file jointly used by different organisations that form 
part of a cross-domain service composition, a common log format is to be specified. 
Even if each service provider generates its own log files, the use of a standardised 
log format facilitates partly automated access to logging data. 


No. 11: /t must be possible to check the compliance of processing operations 
with communicated policies on the basis of log files afterwards. 


Using log files allows for the reconstruction of data processing by the service. Thus, 
it is possible to match policies and log files and to identify incidents that are not 
compliant with the policies. Adherence to this requirement brings into effect the 
principle of accountability. 


No. 12: It must be ensured that log files cannot be contested by their originat- 
ing entity in charge of the processing. 


Not only policies (Requirement 2), but also log files must be binding. The originator 
must not be able to dispute that it generated the log file in its existing form. 


No. 13: The fact that data are logged must be visible to the user. 
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When the processing of personal information is logged, the logs will most likely 
contain personal information as well. The user must be informed of the fact that 
logging is applied, and information on the specific logs may be in scope for subject 
access requests. 


No. 14: The originator of a logging entry must be clearly visible. In particular, 
it must be visible which service provider of a cross-domain service composi- 
tion is the originator of a certain logging entry. 


One purpose of logging is proving the legality of the data processing. It must there- 
fore be clear which entity created a log entry. This is especially relevant if several 
entities write to the same log file. 


No. 15: A simple methodology must enable the user to access logging data 
that s/he has a legal right to access, or that the service provider wants to 
grant access to. 


The user’s right of access includes the right to know what data have been pro- 
cessed for what purpose and whether they were changed. In some cases, the service 
provider might be interested in allowing access beyond what is needed for legal 
compliance to support the trust relationship with the user. 


No. 16: /t must be clear to which data a log entry refers. 


Logs are one source of information for subject access requests. For this purpose, 
logs must describe actions that were applied to personal information (such as mod- 
ifications, transmissions, possibly also simple reading access). These actions could 
be applied to different kinds of data. Therefore the log must be unambiguous in 
describing to which data it refers. 


No. 17: Log files should describe all contractual and further legally relevant 
aspects of data processing. Beyond that, technical aspects should only be de- 
scribed in case they are relevant. 


Obviously logs can become extensive and large amounts of data can be produced. 
Not all actions, however, need to be logged, but only those that are relevant with re- 
gard to data protection (in particular the right of access). Most of the actions applied 
on the data, especially changes, corrections or deletions. 


No. 18: Log files must contain explicit information on recipients or categories 
of recipients data have been passed on to. This includes a reference to the 
applicable jurisdiction. 


This requirement is derived from the legal duty to ensure transparency with regard 
to recipients or categories of recipients of personal information. 
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21.2.3 Requirements for Access to Personal Information 


No. 19: Access to personal information should be provided in an unambigu- 
ous formalisation. The content of the information should be machine inter- 
pretable. 


Unambiguousness of formalisation supports the correct interpretation of accessed 
information and prevents possible legal disputes about differently interpreted infor- 
mation. A machine interpretable formalisation enables users of a service to analyse 
accessed data in a partly automated manner. 


No. 20: /t must be ensured that access to information that has been granted 
cannot be disputed by the granting entity. 


The response to subject access requests must be binding. The granting entity must 
not be able to dispute the information it has communicated. 


No. 21: A simple methodology with regard to request and granting of access 
to information should be provided to users. 


A standardised procedure for the granting of access should be used in order to keep 
efforts for such access low on both sides. Through standardised clauses a - partial - 
automation of the process could be feasible. 


No. 22: Users accessing information must be enabled to easily recognise what 
data covered by what policy have been disclosed to what third parties. 


If the personal information of users is processed when a service is invoked, they 
have the right to obtain information from the service provider about categories of 
processed data, purposes of the processing, and recipients or categories of recipients. 


No. 23: Accessed information should cover only contractual or further legally 
relevant aspects of data processing. 


Service providers are legally obliged to grant users access to specific information 
(see Requirement 17). In principle, the accessible information should be limited to 
this specific information in order to avoid too much complexity. This serves the 
principle of transparency. 


No. 24: Users must be enabled to access explicit information on recipients 
or categories of recipients that data have been passed on to. This includes a 
reference to the applicable jurisdiction. 


21.2.4 Cross-Domain-Specific Requirements 


No. 25: It must be possible to maintain communicated policies even if the 
Service-Oriented Architecture is dynamically adapted (refers to the constella- 
tion of an SOA being established by several entities). 
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It may happen that a member of a Service-oriented architecture leaves the organisa- 
tion and is replaced by another entity. Dynamic changes of this kind should be pos- 
sible without resulting in the need to negotiate policies once again with customers 
or even in the necessity to terminate contracts with customers. 


No. 26: /f it is not possible to maintain (all) communicated policies in case 
of an adaptation of the virtual organisation, it must be possible to adapt the 
communicated policies (builds on Requirement 25) through renegotiation. 


In cases of renegotiation, mechanisms have to be in place allowing for the adaptation 
of already communicated policies to the new conditions in mutual agreement. 


No. 27: A service provider whose service is a downstream part of the overall 
workflow must adhere to policies given by service providers whose services 
are upstream parts of the workflow. 


As the service provider who is in contact with the customer makes binding policies 
for the entire workflow, service providers whose services are downstream parts of 
the overall workflow have to adhere to these policies. 


No. 28: Multi-level-matching within a Service-Oriented Architecture must be 
supported. 


Multi-level-matching takes place when a Service A, which is invoked by a user, 
launches another Service B. In this case, Service A has to integrate the policies of 
Service B. 


No. 29: The ability of the data subject to have access to information must be 
ensured for the future. 


If a service composition or a virtual organisation is decoupled, it may be difficult 
to identify all parties that participated in the specific service afterwards. Therefore, 
mechanisms are to be implemented that allow subject access requests even in such 
a case. 


No. 30: An ex post notice must be enabled by the appropriate mechanisms. 


If policies change, the user is to be informed subsequently (see Reqirements 25 & 
26). Therefore, mechanisms need to be implemented that allow for notice in multi- 
level workflows, even if the user is not known to all services. Equally, it must be 
possible for the user to accept the changes towards all included services. 


21.2.5 Requirements for Additional Mechanisms 


No. 31: Jt must be ensured that the correction and erasure of user data are 


feasible. 
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Data protection legislation gives each data subject the right of rectification and era- 
sure of his/her data to be used towards the controller of the processing. Thus, the 
service provider must be capable of specifically manipulating personal information 
about users. 


No. 32: [t must be ensured that blocking of user data is feasible. 


If data can or may not be erased, there must be a mechanism in place that restricts 
their further use to the necessary minimum for the given situation. Partial blocking 
of subsets of a larger data set must be feasible. 


No. 33: It should be made easy for users to exercise their rights of correction, 
erasure and blocking. 


As correction, erasure and blocking are instances that are initiated by the user, 
technical feasibility as such (Requirement 31) is not sufficient, Rather, it shall be 
smoothly possible for the user to exercise his rights. 


No. 34: It should be possible to guarantee compliance with communicated 
policies. 


For this, a mechanism is needed that technically prevents the service provider from 
infringing its policies — i.e., that a part of the provider’s infrastructure must be ex- 
empted from the provider’s direct control. 


No. 35: There should be a possibility to support trust between user and service 
provider. 


There is a need for an infrastructure that enables users to come to trust an unknown 
provider. This can be built through reputation, amongst other mechanisms. 


No. 36: The user shall have the possibility to express his/her preferences in 
an easy manner. 


As users are quite often technical and legal amateurs, tools enabling them to express 
their preferences in a formalised manner (as defined by Requirement 1) should be 
made available to them. These tools should be easy to use. The preferences can be 
the basis of a partly automated negotiation of new policies. 


No. 37: User and service provider should be able to match preferences and 
related policies. 


In principle, a match of preferences and policies could be processed either on the 
service or on the user side. To allow for different market requirements, such a match- 
ing should be possible on both sides. If both parties are enabled to do the matching 
themselves, a manipulation by one party would become obvious to the other. 


No. 38: Matching of preferences and policies must be comprehensible. 
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Matching must take place in such a way that both the user and the provider can 
comprehend the result, and that the entity processing the matching can reason, that 
it was done correctly. In those cases, where preferences and policies do not match, 
negotiation mechanisms could be employed that adopt the policies according to the 
preferences or vice versa. 


No. 39: Mechanisms to express the anonymity set with regard to a specific 
data type should be supported. 


Every operation should make a statement about the influence that its functionality 
has on the anonymity of a given set of data records. This allows for estimating the 
overall level of anonymity that a workflow provides. 


21.3 Abstract Framework Addressing the Lifecycle of Privacy 
Policies in SOAs 


This section gives a generalised overview of privacy-friendly data handling in cross- 
domain service compositions. We sketch a generic framework that describes the 
general processing steps to achieve privacy compliance and proper data handling. 
The framework is designed in a way that it addresses multi-step data sharing by the 
repeated application of the same principle. Hence, we use the SOA design principle 
idea to chain services by chaining our protocol to archive proper data handling in 
a multi-domain service composition. The framework abstracts from concrete tech- 
nologies and policy languages and is thus intended to support implementation based 
on arbitrary technologies for Service-Oriented Architectures. This approach allows 
for both an abstract consideration of privacy implication and a late adoption. Late 
adoption means that is any existing SOA application shall be able to be enhanced 
even after deployment. This may, of course, cause invasive correction in the current 
implementation of the services, deployment of new components, and mutual agree- 
ment of data provider and data consumer. Finally, this framework focuses on the life- 
cycle of privacy policies and can be complemented by other privacy-enhancing tech- 
nologies that are described throughout this book, such as data encryption, anony- 
mous credentials, anonymous communication, trustworthy user interfaces etc. 

We introduce the idea for an abstract policy framework with a description of the 
simplest possible scenario, a client-server interaction. Client-server technology can 
be seen as the ancestor of Service-Oiented Architectures, yet it is still the nucleus 
of each service-oriented architecture pattern, since even the most complicated ser- 
vice interaction can be broken down to individual interactions between two entities, 
which represent a client and a server. 

Figure 21.la shows the usual scenario for privacy policies, which is also ap- 
plicable to client/server systems. The service exposes a privacy policy (e.g., P3P 
[W3C06]) expressing how it will handle collected data. The user has privacy prefer- 
ences (e.g., APPEL [W3C02]) that reflects her expectations in terms of privacy. By 
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(b) PII exchange in Service Oriented Architecture 


Fig. 21.1: Comparison of privacy policies in client/server architectures and in 
Service-Oriented Architectures. 


comparing privacy preferences and privacy policies, the user (or user agent) deter- 
mines whether it is suitable to share data. 

Service-Oriented Architectures can be seen as applying client/server communi- 
cation in a recursive way. The user invokes a single service. The service does not 
perform the full operation itself, but invokes one or many other services to perform 
parts of the task. These invoked services in turn invoke others services. The result 
is a tree! of service invocation where each node represents a service. We can apply 
the same recursive pattern to privacy policies that are communicated between the 
individual services in an SOA. 


' Tn theory, the service invocation would even form a directed graph containing loops. But for sake 
of simplicity, and based on the common practice, we assume the invocation graph is a tree. 
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Data is collected by a service (data controller) that may share it with third parties. 
When third parties act on behalf of the data controller, they are referred as data 
processors (cf. 16.1.1). When third parties are in a different trust domain, they are 
referred as downstream data controllers (cf. Section 17.1). In the latter case, the 
policy of the third party is taken into account when deciding whether data can be 
shared. 


21.3.1 Privacy Issues Arising from SOA 


Figure 21.1b shows the complexity added by a Service Oriented Architecture 
(SOA), where different parts of the service are offered in different trust domains. 
The extension from client/server to SOA adds a number of problems to the scenario 
with regards to privacy. 

PII provider is the role of entities sharing personal data with PIT consumers. In 
most scenarios, the user (or data subject) is acting as PII provider. Sharing personal 
data with another party is generally restricted by privacy constraints (access control 
and/or expected usage control). Those privacy constraints can be locally specified 
(e.g., a data subject can specify privacy constraints on her data), can be external 
i.e., provided by another party (e.g., a data controller sharing collected data with 
a third party has to enforce constraints imposed by the data subject), or can be a 
combination of local and external constraints. 

PII consumer is the role of entities collecting personal data provided by PII 
Providers. In most scenarios, a service (or data controller) is acting as PIT consumer. 
PII consumer is in charge of enforcing agreed usage control on collected data. Usage 
control is imposed by the PII Provider and can be refined by the PII consumer. 

It is important that, unlike in the simple client/server setting, all services that are 
neither the root nor the leafs of the “invocation tree” may switch their roles during 
the process. For example, if a service typically receives a call in the role of a PII 
consumer, and the parameters of the call bear personal data, the service can switch 
to the role of a PI provider as soon as they invoke another service. This is based on 
the assumption that this second call (as PII provider) forwards part of the personal 
data that the calling service has just received. 

PII provider and PII consumer in a downstream scenario are not necessarily only 
machines. Figure 21.1a illustrates that users share data with services. However, a 
human being can also collect personal data and become a PII consumer. In this case, 
either this person has a privacy policy expressing how he/she handles collected data 
or sticky policies are shipped with the data. This is comparable to a license in rights 
management. 

Other issues arise from the distributed nature of an SOA. First of all, we have to 
assume that each service is part of a different trust domain. We assume that each 
service behaves as intended to an upstream service, i.e., we have no malicious be- 
haviour of the services. This is a common trust assumption that is enforced by rep- 
utation, audit, certification, and/or trusted hardware and software. 
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One principle of SOAs is the late binding of services. That means that the con- 
crete instance of an invoked service can be retrieved only at invocation time. This 
allows for the selection of a service that provides a certain service level. 

We can apply this principle in privacy-aware SOAs as well. A data provider may 
choose the data consumer from a number of data consumers based on the privacy 
policy of this service. We call this privacy-aware service discovery. It is interesting 
that privacy, when regarded as one attribute in a Service Level Agreement (SLA), 
competes with other SLA attributes such as price or quality of service. As a result, 
a service with a more user-friendly policy would appear more often or would rank 
higher during discovery. This may lead to competitive incentive to provide suitable 
privacy. 

The enforcement of usage control (including access control when sharing data) 
is done by each party gaining access to a piece of data. This distributed enforcement 
works properly only when all involved parties adhere to the protection of the data. 
One way to verify that the data handling was done in the correct way, is to compare 
the promised behaviour (sum of all sticky policies) with the executed actions, e.g., 
the log files of all policy enforcement points (PEP). This could be done by a trusted 
third party. This mechanism could even be federated among parties. That is, each 
party provides data for promised behaviour and executed actions. 

In summary, chaining services adds privacy issues to PII handling that goes way 
beyond the relatively simple data sharing model in client-server settings. It is very 
difficult for the user to understand which PII goes where and why. In most cases 
the user is not even aware that PII he/she discloses with one service is shared with 
a third party. Moreover, due to dynamic binding, these third parties may only be 
known at the time of invocation. This situation does not satisfy the requirements we 
gave in Section 21.2 at all. Hence, we propose a slightly modified communication 
protocol between SOA entities and an internal service workflows that help to fulfil 
the privacy requirements. 


21.3.2 Abstract Protocol 


We will now describe an abstract protocol that takes into account the complexity 
issues for privacy that arise from SOAs. The protocol is technology agnostic, which 
has the advantage that is may be implemented in various ways. It is a blueprint for 
how to build privacy-enhancing SOA applications or — in case of late adoption — a 
blueprint for how to make a SOA application more privacy-preserving. 

Figure 21.2 shows the abstract framework from a high-level perspective. It shows 
a PII provider on the left and a PII consumer on the right. This interaction can be 
applied recursively on all segments in a service chain of an SOA application. A 
PII consumer will then take the role of a PI provider as explained earlier, when it 
invokes other services. 

The PII provider side consists of three building blocks: the PII provider be- 
haviour, the PI store and the preferences store. The structure of the PIT consumer 
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Fig. 21.2: Overview of the generic privacy lifecycle in SOA applications. 


side matches and extends the structure of the PII provider, which is an enabler for 
switching the roles from PII consumer to PII provider. The consumer part consists 
of the PII consumer protocol, the PII store, the policy store (which is comparable to 
the preference store), and an additional sticky policy store. 

The PII Store is the database for storing personal data. Depending on scenarios, 
this can be a local database (e.g., on client machine), a service in the same trust 
domain (SQL server at data collector side), or a service offered by a trusted third 
party (e.g., cloud storage), a local credential store (certified PII), a remote creden- 
tial store (e.g., Security Token Service), or a combination of them. The Preferences 
Store stores all privacy constraints. The Preferences Store keeps track of constraints 
that are locally defined (and can be overwritten) and committed (and must be en- 
forced). The Policy Store contains the information necessary to describe how data 
will handled when sent to the PII consumer. A policy can be statically defined or 
derived from business processes. A policy can be generic or specific to a user (i.e., 
depending on authentication). Moreover, a policy can be local or can depend on 
external policies (e.g., the policies of downstream services). The SP Store (sticky 
policies store) stores the policies that were agreed upon between the PII provider 
and the consumer for a specific piece of information. The sticky policy is a result 
from matching” the PII provider’s preferences with the PII consumer’s policy. 

PII Provider and PII consumer communicate via a simple three-step protocol. 


1. Requested Pll types. First, the PII provider asks the consumer side for the PII 
types that are needed for the service invocation. This request may have different 
technical incarnations. It could be a specification in a web form, it could be a 
service description such as WSDL’, but it could also be a dedicated method 
call to request this meta-data about the service that the PII provider intends to 
access. In any case, the PII provider learns about the types of information that 
are requested to perform this service. 

2. Policy request. In the second step, the PH provider asks for the policy that shall 
be applicable to the provided information. In other words, it requests a descrip- 


? Policy matching is defined in Chapter 17. 
3 WSDL stands for web service description language. 
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tion (in a formal language) of the PII consumer’s commitment on how collected 
data will be handled. 

3. Service invocation. In the third step, the PI provider submits the requested PII 
together with a sticky policy for each data item. 


Internally, this three-step protocol is backed up by many sub-procedures and de- 
cisions. We will look into this individually for the PI provider and the PIT consumer. 
From this high-level perspective, it is important to understand that the PII store pro- 
vides the personal data that is shared in the third protocol step. The preference store 
allows the retrieval and editing of stored information. The preference store partic- 
ipates in the creation of a sticky policy for the submitted data items. The privacy 
policy provided by the PII provider side is compared with the preferences for this 
specific piece of information. The result of this matching process is a minimal pol- 
icy that considers both the PII provider’s preference and the PII consumer’s policy. 
Please refer to Chapter 17 for more details on policy matching and the creation of a 
sticky policy. For now, it is enough to consider the sticky policy as the least common 
denominator between privacy preferences and a privacy policy. 

The internal components on the PII consumers’s side look very similar to the 
setting at the PI provider’s side. The policy store allows for retrieving and editing 
policies, which are sent in the second protocol step. The PII store is needed to store 
the received PII from the PII provider. While the PII is kept in the PI store, the 
sticky policy is stored in the sticky policy store. However, a linking mechanism, 
e.g., a dedicated link table in database, keeps a relation between the piece of PII 
data and the sticky policy. 

This high-level protocol could be embedded in an existing application interac- 
tion. The last protocol step is usually the service invocation of any given SOA appli- 
cation. All we do is add an interaction step that requests the required information be- 
fore the actual invocation. This is nothing new. Service discovery mechanisms pro- 
vide this information anyway, but usually only on a data type level. For instance, a 
WSDL description clearly states that a function call calculatePension (date) 
requires the parameter to be a data type encoding a date. Currently, this service de- 
scription is usually on a syntactical level. In other words, the service description 
does not state that the submitted data has to be the user’s date of birth. However, 
semantic web technologies do specify this. Mechanisms for requesting policies are 
widely used for Service Level Agreements (SLA). Hence, both protocol steps 1 
and 2 are special ways of meta-data lookups. We request that the service does this 
privacy-specific meta-data lookup before the real service invocation. 

Figure 21.3 shows the Abstract Privacy Framework in more detail. The figure 
still contains the top-level components for the PII Provider and the PII consumer 
(dashed boxes). Furthermore, it shows the iterative approach by chaining from the 
PII consumer to another PII consumer. Each top-level component contains a best- 
practice workflow. This workflow is an ideal, scenario-independent view on data 
handling in a composed service. Moreover, the workflows introduce more compo- 
nents that should be part of each role’s technical representation. The next sections 
describe all top-level components from Figure 21.3. 
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Fig. 21.3: Generic privacy lifecycle in SOA applications in detail. 


21.3.3 PIU Provider 


PII Provider P is the role of entities sharing personal data with PII consumers. 
In most scenarios, the user (or data subject) acts as PII provider. Sharing personal 
data with another party is generally restricted by privacy constraints (access control 
and/or expected data handling). Those privacy constraints can be locally specified 
(e.g., a data subject can specify privacy constraints on her data), can be external 
i.e., provided by another party (e.g., a data controller sharing collected data with 
a third party has to enforce constraints imposed by the data subject), or can be a 
combination of local and external constraints. The PII Provider’s role is essentially 
about deciding whether it is worth sharing pieces of personal data in order to obtain 
services from PII consumers. 

This section looks into the box labeled “PII Provider” in Figure 21.3. It illustrates 
the general steps a PII provider has to undertake internally in order to support a 
privacy preserving protocol we described in the last section. The figure is intended 
to be a block diagram, so it is neither just a workflow nor a collection of software 
components, but rather a mixture of both. The Service Discovery step refers to a 
mechanism to find and select one or more Data Controllers to be used and gain 
their meta-data, such as functionality, credentials, or SLA. Bringing together PII 
Provider and PII consumer is necessary before any interaction can take place. We 
call this phase service discovery even if, in some scenarios, it can be initiated by 
the PH consumer. In our picture, requesting the required PII and requesting privacy 
policies is covered in two separate steps, but they could technically be merged into 
a single meta-data request. 

PII Lookup is a mechanism to determine whether the PII requirements can be met 
by the personal data in the PII store. It aims at finding all combinations of personal 
data that may satisfy the request of the PII consumer. The PII Provider can have 
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different personal data that match one element of the request and can even load or 
create new personal data (enter a date in a form, attach a picture) with no a priori 
attributes. The PII consumer may accept different types of personal data (e.g., name 
and age), may specify different attributes of PII (e.g., signed by a given third party), 
and may accept special combinations of personal data (e.g., credit card number and 
expiry date must come from the same credit card credential). 

The Policy Matching mechanism decides whether personal data can be shared 
according to its privacy constraints and the preferences of the PII consumer. The 
selected PII must not only match the requested PII in type and semantic, but also 
the privacy preferences associated with the individual PII data items. Privacy pref- 
erences settings are always specific to a given personal data. This can be the com- 
bination of different privacy preferences. For instance, an e-mail address can be 
subject to preferences related to any address, to any e-mail address, and to this spe- 
cific e-mail address. Moreover, a given piece of data can be subject to constraints 
(preferences and sticky policies) from different parties e.g., data issuer (for a cre- 
dential), data subject, or data controller (local preferences). The preferences of the 
selected PII must match with the privacy policy of the service. Chapter 17 describes 
this process in detail. The PII selection process may be very complicated since the 
selected PII and its associated preference mutually influence each other and must 
comply with the service policy. For instance, the user could have multiple credit 
cards with different privacy preferences. Hence, the user can influence the policy 
matching process by selecting the specific piece of PII and by changing the policy 
preferences sticking to this PI. The block diagram reflects this with a loop around 
PII selection, policy matching, and PII selection. 

PII Selection is a process which allows the user to pick the appropriate PIT from 
the PII store. PI selection can be seen as an extension of Identity selection where 
not only minimal disclosure is taken into account but also privacy policies. 

The selection could be done automatically or in a manual process by the data 
controller. It is already a challenge to find a suitable solution in the search space 
across multiple user credentials, but it may be even more challenging to visualise 
this complexity to the user. In case of more than one suitable solutions the user must 
also be empowered to understand what the best solution is. This needs proper user 
interface support (cf. Section 14 for more details). Depending on the actual case, 
the user may want to pick the solution that enforces the most restrictive privacy 
preference, the solution that shares the least amount of data, or the solution which 
does not use a specific set of credentials. 

Thinking about the non-matching case is as interesting as finding a valid combi- 
nation of credentials with associated preferences. When there is no suitable option, 
the user needs to understand why there is no match and propose different options to 
proceed. The user could simply stop the transaction and not invoke the service at all. 
Probably he/she would repeat the service discovery step and pick another service 
that provides a similar functionality. The user could also continue the processing 
and violate his/her own preferences, or adapt the preference in the preferences store 
so that the PII selection yields at least one match. In principle, the service could 
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adapt its privacy policy as well, e.g., if the user picks a more expensive service level 
(“premium service” vs. “free service’). 

For the user to decide to amend his/her privacy preference in order to achieve a 
match with the PIT consumer’s policy, the step change preferences is taken. Note 
that different types of updates can be envisioned ranging from adding a preference 
for this specific case to changing the preferences that covers a larger group of PII 
consumers and/or personal data. 

Finally, when the PII selection was made that matches the policy, we create a 
mutual commitment between the PII provider and the PII consumer. We call this 
step commitment rather than agreement, since we see a policy negotiation as an 
optional step. In other words, we assume for most use cases, the privacy policy of 
the PH consumer is fixed and any adaptation will be made on the preferences of the 
data provider. Hence, this leads to a mutual commitment, but not necessarily to a 
negotiation. 

The agreement itself can be a sticky policy or just a Boolean response indicating 
an acceptance of the PIT consumer’s policy. A more sophisticated technical imple- 
mentation could even foresee a statement that is signed by both parties or witnessed 
by a trusted party. 

In case the agreement is expressed with a sticky policy, which we assume here 
without loss of generality since it is the most expressive form of agreement, the 
sticky policy may recursively specify usage control that must be enforced by the 
PII consumers (including downstream). Indeed, usage control may specify access 
control towards downstream services including downstream usage control. 

The action to attach a sticky policy involves the communication of the requested 
PII from the data provider to the data consumer together with the sticky policy. 
In other words, this step also involves the service invocation. When this step has 
been performed, the PIT consumer possesses the requested PII and the sticky policy. 
Technically, the communication of PII and sticky policy can be performed in a sin- 
gle service call or in two separate calls. Moreover, one sticky policy may apply to 
multiple pieces of data. In any case, it is necessary to keep the link between the data 
and its related sticky policy. The PII provider may keep track of this interaction in a 
history store. This is helpful e.g., when a trusted third party audits the PI consumer 
at a later point in time, when the user wants to base a PII selection decision on ear- 
lier decisions, or when the user wants to verify to whom a certain piece of data was 
disclosed and under which conditions. 


21.3.4 PI Consumer 


PII Consumer is the role of entities collecting personal data provided by PI 
Providers. In most scenarios, a service (or data controller) acts as PII consumer. 
PII consumers are in charge of enforcing agreed data handling on collected data. 
Data handling is imposed by the PII Provider and can be refined by the PII con- 
sumer. The PII consumer’s role is essentially about 1) checking whether actions 
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are authorised before acting on collected personal data and 2) enforcing obligations 
regarding those data. 

This section elaborates the box labeled “PII Consumer” in Figure 21.3. Similar to 
the previous section, it illustrates the general steps a PII consumer has to undertake 
internally in order to support the privacy-preserving protocol we described in Sec- 
tion 21.3.2. Again, this figure is intended to be a block diagram, so it is neither just 
a workflow nor a collection of software components, but rather a mixture of both. 
The Meta-data Provider is the matching part to the meta-data request on the PII 
provider side. It provides the requester with a functional description of the service 
and optionally with the SLA. In this context, it is important that meta-data states 
information about certification, the PII requirements, and the privacy policy. The 
information is gathered from the service implementation itself, e.g., it could be part 
of a WSDL document, and from the policy store. The meta-data can be static, which 
means each caller gets the same information, or the meta-data can be dynamic, in 
which case (part of) the information is specific to the context of the request. For 
instance, different callers (PII providers) may get different privacy policies because 
the PII consumer shares individual legal agreements with the various PII providers. 
Another example is that the policy may depend on the geographical region of the 
caller. 

The PII provider digests this information, selects the right PII data, and matches 
preferences and policies. Finally, the PII provider invokes the PII consumer’s ser- 
vice and thus submits the requested PII and a sticky policy. The PI provider must 
verify that the sticky policy matches its policy. This check is necessary to avoid 
service errors or even legal implications through a wrong sticky policy. The sticky 
policy could, for instance, disregard the PII consumer’s policy and define arbitrary 
obligations for the PII consumer. 

Since we consider sending the PII and the sticky policy as a single step (cf. Sec- 
tion 21.3.3), the PII consumer’s answer is the result of the service invocation. 

During or after the execution of the service, the PII consumer may store the data 
and likewise the sticky policy. Moreover the PIT consumer needs to establish a link 
between both data items. That is, the service provider needs to remember which PII 
is associated with which sticky policy and vice versa. The PII is stored in the PII 
store, the sticky policy is kept in the SP store. The link between both goes either as 
a reference to both stores to allow a bidirectional mapping or is kept in a dedicated 
data structure that holds references to the respective entries in PI store and SP store. 

Even after the service invocation, the PII consumer may want to take advantage 
of the collected PII and use this data for an allowed purpose. We distinguish two 
ways of using these data: it might be used for local purposes or it might be passed 
on to a third party. Local use means that the data is used inside the trust domain 
of the PII consumer, e.g., by a second service that operates on the same PII store. 
In this case, the PII consumer has to verify whether or not the data is allowed to 
be used for the given purpose. A different case is passing on the data to a third 
party, i.e., a PII consumer in a different trust domain. This is the moment when 
the PI consumer switches its role to a become a PII provider. In both cases, the 
verification of access rights to stored PII is performed by an access control engine, 
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which enforces the authorisation. The sticky policy associated with each piece of 
PII helps the authorisation engine to decide whether the user agreed to this purpose 
or not. 

In addition, guarding collected data whenever it is accessed, the PII consumer has 
to follow-up on obligations that were agreed upon with the PII provider. It has to 
react to events (scheduled or relevant) and execute appropriate actions. We foresee 
two types of infrastructure components in a PII provider-side obligation enforce- 
ment engine. First, there are action handlers. The action handlers are the link to the 
legacy systems in the PI provider’s IT infrastructure, e.g., a database or mail server. 
Their duty is to execute actions on legacy applications such as sending notifications, 
logging, or deleting data. The second component type is the event handler. It is re- 
sponsible for handling events from legacy systems that are relevant from a privacy 
point of view. Typical events are a time-based events and data access events from 
the legacy system. 

An interesting extension is the logging of all obligation enforcement actions and 
access control decisions. A formalised logging would allow a trusted third party, 
such as an accredited auditor, to compare these logging data with the obligations 
in the sticky policy. An automated matching process would enable the auditor to 
see which obligations were kept by the PI provider and certify the PII consumer 
accordingly. 


21.3.5 Matching Abstract Framework with SOA Requirements 


In this section, we want to compare the abstract framework addressing the lifecycle 
of privacy policies in an SOA from Section 21.3 with the requirements we sketched 
in Section 21.2. The framework clearly focuses on the cross-domain-specific re- 
quirements (No. 25-30), but it is a also a vital infrastructure to achieve core policy 
requirements (No. 1-9) and requirements for additional mechanisms (No. 31-39). 
We will give a short reasoning for each of the covered requirements. 


No.1: The exchange of machine-readable policies is an essential part of the ab- 
stract framework. The proposed infrastructure absolutely rely on the communi- 
cation of policies between service provider and service consumer. 

No. 2: One way to achieve binding policies is by electronic signature. The frame- 
work does not explicitly enforce such a mechanisms, but the protocol step Mutual 
commitment provides the opportunity for a dedicated commitment protocol. 
No.3 & 4: The abstract framework is more of an infrastructure, thus it has no user 
interaction. But the protocol cycle of Policy Matching and PII Selection certainly 
needs to present the policy to the user in human-readable manner. 

No. 5-9: These requirements need to be addressed on the level of the policy lan- 
guage. 

No. 10-18: Although logging is not the core purpose of the abstract framework, it 
foresees a History step on the PII Provider side. On the side of the PIT consumer, 
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logging would be the duty of the action handlers, because the logging should de- 
scribe what has happened to the PII. Requirements 11-18 specify implementation 
details of the logging mechanism. This is not explicitly addressed by the abstract 
framework, but could be fulfilled by the proper selection of technologies. 

No. 19: | Requested PII is described in service meta-data and stored in databases of 

the PII consumer. This should fulfill the requirement assuming that the semantic 

meaning of the meta-data is carried on to the PII Store. 

No. 20: Non-disputable access is ensured by sticky policies linked to PII in the 

PII Store. 

No. 21: The methodology of granting access PII is to attach a sticky policy to 

the data that fulfills both the PII consumer’s policy and the PII Provider’s prefer- 

ences. 

No. 22-24: These requirements need to be addressed on the level of the policy 

language. The building block History on PII consumer’s side helps to achieve 

No. 22. 

No. 25: This is one of the main aspects of the abstract framework. It explicitly 
addresses dynamically adapted SOA. Since sticky policies travel with PI, and PII 
consumers will change their role to PII Providers when passing on information, 
the user’s intent always travels with the disclosed information. 

No. 26: The abstract framework does not support the renegotiation of policies, 
but the sticky policy traveling with the disclosed PII is always matched against 
the latest version of PII consumer’s policy. 

No. 27: This policy is automatically fulfilled, when the service adheres to the 
sticky policy communicated along with the data. 

No. 28: The abstract framework is designed to be used in a recursive way. A for- 
mer PII consumer switches to the role of a PI provider when passing on data. 
This ensures multi-level matching. The entire service chain does not need to be 
known a-priori. 

No. 29: Informing the user about the whereabouts of his/her data is possible when 

the user expresses in the sticky policy an obligation in to notify him before pass- 

ing on the data. 

No. 30: If a service provider changes its policy, it is relevant only for new service 

requests. Data received under another policy will be attached with a sticky policy 

that results from the original version of the policy. 

No. 31-33: The building blocks Obligation Enforcement and the PII Store would 

help to fulfill the correction, erasure, and blocking of data. However, user inter- 

action invoking this right is not part of the framework. 

No. 34: Compliance is one of the main purposes of the obligation enforcement 

mechanism on the PII consumer side. 

No. 35: Trust establishment is not addressed by the abstract framework. 

No. 36: Privacy preference on the PII Provider side is a core principle of the ab- 
stract framework. Again, the user interface aspect is not part of this infrastructure, 
but PII Store, Policy Matching, and PII Selection require a proper user interface. 
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No. 37-38: Policy matching is again one of the core principles of the abstract 
framework. It is utilised in the building blocks Policy Matching and Check Sticky 
Policy. 

No. 39: This requirement is not supported. The anonymity set could probably be 
expressed as meta-data. 


Concluding, we state that the abstract framework addressing the lifecycle of pri- 
vacy policies in an SOA fulfills most of the requirements for privacy in SOAs. Some 
requirements need to be addressed by a wise selection of technologies instantiating 
the abstract framework. The choice of the policy language, for instance, has a major 
influence (No. 5-9, 22-24). Other requirements need to be addressed by means of 
extra infrastructures, such as establishing trust (No. 35). 


21.4 Policy Composition 


In most cases, the data consumer’s privacy policy is shown to the user as natural 
language text, and the user can accept it or not depending on her privacy prefer- 
ences. Typically, if the user does not accept it, he/she is not allowed to proceed. This 
process should be automated to enable more complex interactions to take place, 
such as supporting the user in tracking the usage of her PII, facilitating the enforce- 
ment of her privacy policy and dealing with more intricate cases, where the data 
exchanged comes from multiple sources, each with its own privacy policy. In the 
latter case, data must be aggregated as well as the privacy policy. To this extent, 
particularly relevant is the concept of a sticky policy [APS02]: privacy policies are 
strictly associated to a piece of data and should be composed whenever data aggre- 
gation happens. 

Expressing a condition for each piece of data could be a means for data providers 
to declare how their personal data should be used. For example, it may be specified 
that the information is not to be transferred to a third party or that it can be used 
for research purposes only. For example, a job applicant would like to create an 
electronic CV (eCV) that contains up-to-date information on her personal details 
(gender, age or race), work experience, academic qualifications and references. This 
personal data could be entered either by the user or provided by other services such 
as a university, previous/current employers or an official entity, each of them con- 
taining corresponding privacy and access control constraints (e.g., not to be released 
for commercial purposes or for internal use only), expressed as privacy policies. 

The job applicant should be able to compose them in a single document with a 
single privacy policy, and be able to handle possible conflicts among policies. On 
the other hand, a job broker service may need to compose job offers from multiple 
sources and aggregate the corresponding policies. The aggregated policy may sim- 
ply be the union of the source policies if they express non-conflicting conditions 
(e.g., one policy defines the purpose as research but it does not impose any condi- 
tion on a retention period, and the second policy states a specific retention period 
but it has no requirements on the purpose). In the case where two or more policies 
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Fig. 21.4: Privacy-aware architecture with policy composition. 


specify the same condition (e.g., both state a different retention period), it may still 
be possible to find an aggregated policy that satisfies both of them or more com- 
plex mechanisms of policy compositions would be needed. Even if privacy policy 
languages exist, such as P3P [W3C06], EPAL [BDS04] or PRIME [PRI], they lack 
the notion of sticky policies or the complex composition of services or policies for 
resolving possible conflicts [BZW06]. 


21.4.1 Policy Composition Scenario 


To introduce the challenges related to the privacy policy composition, we present an 
illustrative employment scenario in which users and job providers are able to interact 
via different web services. A user (job applicant) would like to create an electronic 
CV (eCV) that contains up-to-date information on his personal details, work expe- 
rience, and academic qualifications. The personal data, such as the person’s gender, 
age or race, could be entered by the user or provided by an official authority ser- 
vice. Other types of information may include university degrees, recommendation 
letters and previous or existing employer details, and they could be provided by the 
corresponding organisation/data provider as a signed digital document or as a refer- 
ence (Fig.21.4). For example, the university certifies on qualifications attained and 
a recommendation is usually provided by an academic institution and/or employer. 

Each contributory data provider could have a rule or a sticky policy attached to 
the data that outlines how the data will have to be handled when used by the data pro- 
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ducer (the job applicant), data consumer (e.g., job broker) or third party. The parts 
of the eCV that contain this information cannot be altered by the applicant in order 
to preserve the policy preferences of the different services. These constraints, im- 
posed by such data-providers, may restrict the exposure of some information which 
is related to a company and should not be revealed. For example, a policy will only 
allow you to use a recommendation letter for a specific timeframe or it may be the 
case that the applicant will allow a certain country, such as the United Kingdom, to 
see his race, as it is a prerequisite to process the application but will not permit other 
countries to see the information if this is not a precondition. 

The final electronic CV is composed of two parts, namely, the composition of 
data emanating from the different sources and the corresponding aggregated poli- 
cies. The policy composition may contain conflicts, for example, the applicant may 
allow his personal contact details to be viewed by all services whereas the company 
he/she is working for states that it will not permit disclosure of where the employee 
works for security reasons. 

Similarly, on the data consumer side (service side), we have a job broker service 
that is composed of offerings proposed by job providers or recruiter services. The 
latter contains rules on data usage conditions; for instance, they will not pass infor- 
mation to other parties, or they will retain the data for a certain period of time. When 
clients contact the broker with a request for a job offer that corresponds to certain 
search criteria, the latter selects the appropriate recruiter and puts it in contact with 
the eCV service. We assume that the job matching is done using the broker’s search 
engine and that the recruiter service is requesting the applicant’s CV (so the job 
broker does not need to know all the details of the CV, but just the search criteria). 
Subsequently, there will be a policy matching between the two services in a policy 
engine service (Fig.21.4). When the data provider’s privacy policy matches with the 
constraints outlined by the data consumer then the CV is sent to the job provider ser- 
vice. If this is not the case, then the recruiter’s request is rejected and the applicant 
is notified. Two services should then relax some constraints and try again (policy 
negotiation). 


21.4.2 Privacy Policy Composition Challenges 


The simple scenario described in the previous section outlines some challenges re- 
lated to the policy composition on the data provider/consumer side. We provide here 
a non-exhaustive list. 


e Client Vs Server side: since the privacy policy on the server side is published to 
announce the way the data will be handled and the policy in the client side is stuck 
to the private data to describe how this data should be treated, it is important to 
distinguish between policies emanating from the client side and from the server 
side. Composing and enforcing such policies should be handled in a separate 
manner. 
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Case Retention Purpose 


Pa * Research 
Pp OM * 
De 3M _ Marketing 


Table 21.1: Example of the elements of 3 simple possible policies. « indicate that 
no condition is expressed, thus all possible values are permitted. 


e Aggregation/Combination: the privacy policy engine must support the aggrega- 
tion of policies when multiple policies from different sources refer to a specific 
piece of data. Aggregation in this context refers to a combination of policies that 
refer to different data handling constraints not resulting in a conflict. In practice, 
aggregation is represented by a union of a set of policies. On the other hand, pol- 
icy composition is when the policies refer to the same element and conflict may 
arise. We will discuss the two cases in more detail below. 

e Trusted third-party: when the data consumer is composed of different services, 
then the data provider should be able to deal directly with the relevant trusted 
third-party (TTP) and not be obliged to divulge information unnecessarily. This 
TTP filters the relevant data that will be displayed to the server without any indi- 
cation about the original dataset or the privacy policy. 

e Negotiation: When the client or the server has a trade-off to make in order to 
achieve a transaction, it is necessary to find a compromise between conflicting 
rules. The precedence system should play an important role to automate this 
requirement. For example, in the context of the scenario, the data producer may 
be willing to provide PII for research purposes if a criteria such as salary was 
above a certain threshold. 

e Content-based condition: Policy conditions may depend on the content of the 
data, e.g., data may be used for marketing purpose only if age, as reported on 
the CV, is greater than 21, or job level may be disclosed to third party only if 
lower than a certain threshold. Such constraints are difficult to address since they 
introduce an interaction between the data handling policy, which expresses how 
the data should be used, and the content of the data itself. 


To illustrate the problem of aggregation/composition, let us consider a toy- 
example of combination of two policies with two elements only {Retention period, 
Purpose}. Let us call the area in the policy space (in our example the {Retention 
period, Purpose} space) that fulfills the policy pa (pp, Pc, respectively): & (F, ©, 
respectively), see Fig. 21.5. For example, policy pz is fulfilled for any retention pe- 
riod with the condition to have ’Research’ as purpose, policy py, for any purpose but 
with retention of 6 months or less.* Let us examine some possible combinations: 


4 We make the assumption here that if the retention period is set 6 months, policy is also respected 
if it is actually less. This is a reasonable assumption in most use-cases, but there could still be 
scenarios where regulations impose a minimal retention period. For example, an internet provider 
may outsource the storage of IP addresses to a third party, defining a corresponding privacy policy 
that indicates as retention, say, 12 months and not less, in accordance with the regulations. 
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Fig. 21.5: Data Usage: Purpose Vs Retention Period. 


Combining policy py and py. These policies have no overlapping elements, and 
the aggregate policy is simply the union of them: p, U pp, which is fulfilled for 
the (non empty) area of the policy space: 7% &, see Fig. 21.5. We call this 
process: policy aggregation (no conflicts). 

Combining policy pp, and p,. These policies have a common element: retention 
period. Still, the corresponding areas in the policy space have no empty intersec- 
tion (under the assumption detailed in the footnote), AN@ # O, and a combined 
policy can be easily derived: {Retention period=3 M, Purpose= Marketing}. 
Combining policy p, and p-. These policies have conflicting elements (Purpose). 
Accordingly, there is no policy that fulfills both of them <1 @ = 0. The resulting 
policy cannot be easily derived, and composition rule should be defined, such as 
precedence rule (e.g., pg overwrites p, because it is coming by an authoritative 
source), or simply the composition cannot take place. 


For policy composition in a general case, rules should be defined. We will not 


enter in details in this paper; we will only examine a possible case of conflict in the 
prototype description, which has been addressed introducing a simple precedence 
tule. 


21.4.3 Data-Centric Architecture for Privacy Enforcement 


In this section, we present a data-centric architecture that addresses the require- 
ments previously mentioned and involves a data flow that follows a path containing 
services that are able to receive, compose, elaborate, store, and publish data from 
other services. There are two distinct service types, one being the producer and the 
other the consumer, with sometimes a service acting as both. A producer stores or 
produces data while the consumer, who can access all data that is exposed, invokes 
an operation on the producer to access data that he/she wants to consume (Fig. 21.7). 
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Fig. 21.6: Privacy-aware architecture. 


In the context of the eCV scenario, this structure would be a direct link between 
the eCV and the job broker service, with no policies outlining how the data should 
be handled between the two services. In order to regulate access to the data, there 
is a need to add a privacy control from the perspective of both the data producer or 
provider and data consumer; the former is the eCV while the latter is the job service. 

In Figure 21.6, the Access Profile is a document in which the Consumer declares 
the subject (who it is), resource (which data needs to be accessed) and action (what 
will be done with the data) as well as information on the usage conditions of the 
data, such as retention time and disclosure to third party. Privacy policies are a set 
of rules for accessing the data and are based on the attributes specified in the Access 
Profile; they travel with the data as sticky policies. 

The privacy aware architecture can proceed as follows: the Consumer Service 
sends an Access Profile request to the Policy Engine; the Engine receives the Access 
Profile request and translates it into an access query call to the Producer Service; the 
latter returns data with sticky policies; the Engine compares the Access Profile with 
the sticky policies in order to enforce the privacy rules over data; depending on the 
profile of the requester, the Policy Engine will select the parts of data to be displayed 
or hidden; in the case of a successful matching, the Engine sends back the data to 
the Requester Service, otherwise there will only be a negative answer. 
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When the Access Profile contains data aggregated from multiple sources, the as- 
sociated policies should also be collected. Since some of these policies can concern 
common parts of the data, conflicts need to be resolved in order to provide a co- 
herent and secure policy enforcement. In this case, the Composed Provider (eCV 
service in the scenario) will be in charge of reconciling all of these policies in order 
to obtain a single composed policy satisfying the privacy constraints related to the 
sensitive data. The composed policy is then stuck to the dataset and transmitted to 
the Policy Engine. On the consumer side, in the case of composite service, privacy 
policies will also be composed and transformed into a single policy by the Com- 
posed Consumer (Job Broker in the scenario). The Policy Engine will be in charge 
of matching the consumer’s policies with the provider’s preferences. After selecting 
the appropriate consumers, the Policy Engine will enforce the provider’s policy and 
provide the data that is permitted to be sent, to the consumers. 


21.4.4 Conclusion 


In a situation where both the data consumer and data provider compose their own 
policies on data usage, it may be necessary to deal with conflicts. This can become 
more complex when a web service is composed of different services and consequen- 
tially many policies. 
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In this section, we looked at privacy policy composition from the perspective of 
the data provider by proposing a scenario that provides an overview of the issues 
involved and subsequently outlined a data-centric architecture that could be used to 
resolve them. 


21.5 Outlook and Open Issues 


This chapter started with requirements for privacy in Service-Oriented Architec- 
tures. It collected 39 legal and technical requirements, grouped into five categories. 
These requirements were the starting point for a technical framework that brings 
privacy-enhanced data handling to multi-layered, multi-domain service composi- 
tions. We describe an abstract framework that is technology agnostic and allows for 
late adoption in existing SOA applications. The adoption could be partial, i.e., be- 
tween just two entities of a larger SOA application. We described the general build- 
ing blocks that are necessary on the PII provider’s side and on the PII consumer’s 
side. Finally, we looked at the technical implementation of a very common yet com- 
plicated aspect. That is the composition of policies when composing information 
artifacts. We described how the composition of data influences the composition of 
policies. 

Certainly, this work is far from complete and leaves many open ends for future 
work. One of the most interesting points is to manifest the abstract privacy frame- 
work with a semi-formal notation. This would allow for specifying the protocol and 
the relationship between the data items in much more detail. It would even allow 
for describing precisely the complicated relationship between credentials, requested 
PII, preferences, and policies during the PII selection process. The visualisation of 
the PII selection is an important point as well that is only partially addressed so 
far (cf. Chapter 14). Another interesting route is to map existing technology with 
the abstract framework. It would be interesting to see how existing technology al- 
ready covers part of the picture. This gives the reader a better method for making a 
technology choice for his/her SOA. 


Chapter 22 


Privacy and Identity Management on Mobile 
Devices: Emerging Technologies and Future 
Directions for Innovation 


M. Bergfeld and S. Spitz 


Abstract Secure Elements have been around as identity providing modules in Mo- 
bile Services since the creation of the Mobile Phone Industry. With an increasingly 
dynamic environment of Mobile Services and multiple Mobile Devices, however, 
and with an ever changing ecosystem, characterized by new value chain entrants, 
new (partial) identities need to be provided for the end users. Here, emerging Se- 
cure Elements such as Stickers and Secure SD cards can be leveraged in addition to 
the omnipresent SIM card / UICC. For future services though, even more flexible, 
secure and privacy enhanced Secure Elements, such as Trusted Execution Environ- 
ments can be expected. They are needed to cope with an ever more dynamic Mobile 
Services environment that depends upon reliable, partial identities of the end users 
and increasingly calls for privacy and security measures. This chapter elaborates 
upon the emerging and future Secure Element technologies for Mobile Devices. 
These technologies shall allow an increasingly dynamic creation of services be- 
tween front-end Mobile Devices and back-end Servers. The Chapter sets the current 
developments of the ecosystem for Mobile Services into perspective with the needed 
technologies, reflects on the contributions of the PrimeLife project and draws atten- 
tion towards the still needed future directions of innovation. 


22.1 The Status: Privacy and Identity Management on Smart 
Mobile Devices 


With the lifestyle of “Digital Natives” [BT10] spreading throughout the younger 
generations and the “Wisdom of the Crowd” [BT10] gaining relevance in the cre- 
ative activity of private as well as professional collaboration, open technology sys- 
tems that span across numerous individuals and groups are increasingly important. 
While these systems empower the dynamic creation of new services and business 
models, especially through the usage of Mobile Devices such as Mobile Phones, 
Netbooks, Tablet PCs or even Cars (e.g., via their onboard computers), and their in- 
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teraction with back-end Servers in Service-Oriented Architectures (SOA), all these 
opportunities for open collaboration and the rapid sharing of data, information and 
knowledge also increase the challenges for security, privacy and identity manage- 
ment. For example: 


e Providing trusted platforms and security for the execution of services between 
Front-end Mobile Devices and Back-end servers. Also, securing these interac- 
tions against attacks and interventions. 

e Providing the possibility for multiple (partial) identities in one Mobile Device, 
i.e., allowing the end user to consciously manage different identities for different 
audiences. 

e Providing dedicated channels of communication, storage and interaction for par- 
tial identities of individuals and assuring that these channels respect the privacy 
of the individual end user by only making the information transmitted accessible 
to the respectively intended recipient. 

e Providing solutions of anonymity where applicable without jeopardising authen- 
tication. 


This chapter elaborates upon the emerging technologies for Mobile Devices that 
are expected to allow an increasingly dynamic creation of services between front- 
end Mobile Devices and back-end Servers, including the use of technologies that 
promote security, allow identity management and enhance privacy. 


22.2 The Changing Context (I): Multiple Partial Identities across 
Devices 


It has been pointed out that scalable privacy models in conjunction with different 
levels of security are important for identity management (see [Pri08a]). Most of the 
underlying technologies such as cryptography are available (see [SSP08]), although 
they might have to be tailored to the particular needs of identity management. 

Further, it has been shown that individuals tend to leverage different identities 
or roles in their lives, when acting in, for example, a professional context (e.g., as 
an employee), in the context of a customer-company interaction (e.g., as a bank 
client), in the context of special interest communities (e.g., in online gaming) or in 
the context of their family and close friends. The structure of present technologies 
and the users interaction with these, however, have been found to conflict with the 
natural segregation of audiences and therefore blur the existence of partial identities 
and the privacy considerations attached to these; the natural segregation of audi- 
ences has largely been lost through present technologies such as centralised social 
networks and storage systems (see [vdBL10]). 

Consequently, emerging technologies that strive to empower privacy-enhanced 
and identity management-enabled services will need to satisfy the following re- 
quirements: 
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e Enabling numerous privacy and identity settings for different audiences, poten- 
tially also life-long (see [Pri09]). 

e Be scalable across technology platforms (e.g., across Mobile Phones, Netbooks, 
Tablet PCs or even Cars - via their onboard computers). 

e Integrate private and professional interaction (e.g., private and professional email) 
and multiple partial identities (e.g., for different Mobile Services such as ticket- 
ing, banking, loyalty programs, social communities or government services) into 
one Mobile Device. 

e Work seamlessly when private and professional services are accessed on the 
move, e.g., across the entertainment system of the car when driving, the private 
Mobile Phone, or the company Laptop. 


22.3 The Changing Context (II): Multiple Identity Providing 
Stakeholders Along an Increasingly Dynamic Mobile 
Services Value Chain 


Looking at the ecosystem through which Mobile Service are provided, numerous 
stakeholders can be identified: 


e There are design entities for the Central Processing Units / platforms of the de- 

vices. 

There are producers of these CPUs. 

There are handset manufacturers. 

There are Mobile Network Operators (MNOs). 

There are Service Providers, for example providing additional applications for 

ticketing, banking, loyalty programs, social communities or government services. 

e There are Service Enablers, serving as (trusted) third parties to ensure and secure 
a seamless execution of Mobile Services. 


Initially, the MNOs were the dominant players with regards to security and the 
identity involved in the provision of Mobile Services. They provided the identity of 
the individual user when registering with the network through the identity embed- 
ded in the SIM (Subscriber Identification Module) card. With the limited scope of 
identity-related Mobile Services being available in the past, security, privacy, and 
identity management topics were embedded into rather static environments: 


e Fixed and concrete client and server components, actors and scenarios consti- 
tuted the service environment. 

e Single identities were sufficient for a predefined set of Mobile Services. 

e Time was sufficient to develop and modify client and server components when 
new security, privacy and identity challenges arose. 


At present, the other stakeholders along the Value Chain of Mobile Services are 
becoming increasingly involved: Handset manufacturers interact with their users 
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directly via online client portals and provide applications for them (e.g., Black- 
berry.net and the Blackberry App World, Apple ITunes, Nokia OVI, etc). MNOs 
are moving in similar directions and a plethora of new Service Providers offer their 
applications via the open interfaces provided to them by the handset manufacturers. 

Many of these direct interactions with the end-user of the Mobile Devices include 
security, privacy and identity management issues. 

At present, however, the industry approach to managing these issues is highly 
fragmented. Many stakeholders intend to provide proprietary solutions on propri- 
etary Secure Elements and prefer to secure access to the private data of their end 
users for their proprietary purposes. In order to tackle this trend, any intention to 
provide a consistent approach for security, privacy and identity management for 
Mobile Devices would need to be flexible and would need to integrate with the 
rapidly changing context of the Mobile Services industry. The presently developing 
industry context can be described as follows: 


Heterogeneous scenarios are omnipresent and security requirements vary strongly. 
Multiple identities (partial identities) will need to be empowered, without the 
predefined set of Mobile Services being known. Hence, rule-awareness rather 
than fixed requirements need to be implemented in order to embed security, pri- 
vacy and identity management-aware behaviour in the overall system of Mobile 
Services. 

e The equipment needs to be context-aware and flexibly follow the overarching 
rules under different situations. 

e Time is insufficient to develop and modify client and server components when 
new security, privacy and identity challenges arise. Hence, the security, privacy 
and identity management needs will “co-evolve with the [...] steadily changing 
context into which it is embedded” [Pri 8a]. 


This need for flexibility in providing security, privacy and identity management 
will increase even further in the future. In addition, open interfaces will be needed 
for Mobile Services that leverage multiple stakeholders at the moment of deliv- 
ery and higher storage capacities will be required for more data intensive services. 
Hence, future solutions in this area will need to be highly dynamic/adaptive to ever 
changing environments: 


“This will consider unknown equipment, actors, and heterogeneity of space. 
The definition of SoS will result in security policies. The client and the server 
will know the policies [..., and] take a “flexible and secure,” “pervasive and 


secure,” “resilient and secure,’ “recoverable and secure” character, depending 
on the situation.” [Pri08a] 


Given that Smart Private Mobile Devices need to be designed according to the 
above described dynamic context in which they are used, the question arises how 
Mobile Services can provide security, privacy and identity management for the mul- 
tiple stakeholders along the Value Chain, the multiple technology platforms involved 
in servicing the end user (e.g., the CPU of a Laptop, a Netbook, a Mobile Phone or 
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the navigation and entertainment system of a car) and the highly dynamic environ- 
ment. 

Briefly, one could ask “How can a Mobile Ecosystem for secure, privacy- 
enhanced and identity management-enabled services be designed and provided?” 

It has been shown elsewhere that modularisation through interfaces, the stan- 
dardisation of these and the collaboration across corporate boundaries are essential 
characteristics to provide innovative systems that comply with highly complex and 
dynamic market and technology environments [Ber09]. Secure Elements (SEs) are 
such highly modularised and standardised technologies. Different SEs can tailor for 
different levels of flexibility, openness and storage capacity (see Figure 22.1). 


Highly Dynamic Technology status: 
Mobile Services Emerging 


Flexible Technology status: 
Mobile Services Developing 


TEE 

Le! 

/\ 
j 


Sticker 


Static Technology status: 
Mobile Services Existing 


Storage capacity, 
flexibility and 
openness. 


Fig. 22.1: Secure Elements between static and dynamic Mobile Services. 


Because SEs play an important role for identity management and privacy in the 
context of Mobile Services, and because a choice between different SEs directly 
influences the extent to which identity management and privacy can be provided, a 
brief description of exemplary Secure Elements is given below. 


22.4 Technologies for Identity Management and Privacy 
Enhancement: Secure Elements 


Secure Elements (SEs) are platforms, particularly for Mobile Devices, on which 
Applications can be installed, personalised and managed. Increasingly, this can be 
done over-the-air (OTA). With recent! technology developments, OTA provisioning 


' Recent referring to the introduction of such services from 2006/7 onwards. 
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of Applications is done via a Trusted Service Manager (TSM). This helps to adapt 
formerly static SEs more flexibly for new Mobile Services and Applications.” 

Further, SEs are seen to potentially provide a “safe resort” for value-intensive, 
critical Applications that use significant professional and private data, especially as 
the environment for Mobile Devices and the services provided via these is increas- 
ingly challenged with risks of data theft [Bac10], espionage [Rt10], and security 
breaches [Web10]. 

On a conceptual level, SEs can be categorised into three different areas: 


Removable SEs (e.g., Stickers, Secure Micro SD cards and UICCs?) 
Non-removable SEs (e.g., embedded SEs) 

SEs from a combination of software programs on dedicated hardware (e.g., 
Trusted Execution Environments).* 


The history of Secure Elements and the capabilities of Smart Cards and Tokens 
in the context of Secure Dynamic Mobile Services has already been analysed else- 
where (see [Pri08a]). 

Further, the security in embedded systems and the different virtualisation tech- 
nologies have been analysed and the usability aspects of Secure Environments 
have been commented upon, and the applicable cryptography has been revised 
(see [SSP08]). 

In essence, it has been shown that SIM / UICC cards, for example, have advan- 
tages for operator-specific, static and highly secure identification tasks, and embed- 
ded security systems. In comparison, virtualisation technologies are more applicable 
for highly dynamic service provisioning (see [Pri08a]). 

In addition to the well established Smart Card/SIM/UICC technologies, some 
additional Secure Element technologies have emerged successfully in the Mobile 
Ecosystem, whilst others have not spread so widely. As a result, the present context 
of SE technologies for Mobile Devices appears as shown in Figure 22.2. 

A selection of the above-mentioned SEs also supports increasingly Dynamic Mo- 
bile Services — without sacrificing security. 

Technologies such as Secure USD cards, Stickers and selected embedded Secure 
Elements have seen rather wide uptake in the market, because they enable new Mo- 
bile Services in the Value Chain for established stakeholders and/or were accessible 
to new players for the establishment of new Mobile Service concepts.> 

Secure Micro SD cards (uSD) and the Trusted Execution Environment are of 
particular relevance, as they combine increased security with increased flexibility. 
Because these two SEs can also be used as storage and processing platforms for the 


? For practical examples, see, e.g., www. venyon.com or www.smarttrust.com 

3 A UICC is a UMTS Integrated Circuit Card, i.e., a type of Subscriber Identification Module 
(SIM) used in 3G UMTS devices. 

4 Tn the case of the TEE, the SE consists of a physical module, e.g., a partition of the CPU and 
software embedded into this physical module (e.g., a secure Operating System). For a detailed 
elaboration on the different categories of SEs. 

> Also see [Mob10] for a detailed analysis of the Mobile Value Chain — with particular attention to 
Mobile Financial Services. 
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Fig. 22.2: Various Secure Elements as “Private World” in Mobile Devices. 


identification of individuals and their credentials, they are particularly relevant for 
privacy-enhanced and identity-management-enabled services that need to be highly 
secure and flexible. 

Further, they offer largely open interfaces within their architecture in order to 
promote a rapid uptake by existing and new stakeholders along the Mobile Services 
value chain. 

However, initiatives to embed Trusted Platform Modules into Mobile Devices 
have not succeeded on a wide basis. Apparently, the economic incentive for the dif- 
ferent stakeholders along the value chain of the Mobile Services industry remained 
unclear and fragmented business interests along the value chain were not orches- 
trated for a systemic solution [Mob10]. Neither has the potential option to integrate 
an additional Smart Card reader into Mobile Devices found wide acceptance.® 

In order to review existing Secure Element technologies and subsequently intro- 
duce emerging technologies for future innovations in the field, selected technology 
examples are introduced in the following sections. 


© The additional effort and costs for the handset manufacturer, who largely operates based on the 
requirements of the Mobile Network Operators (MNOs), were not consistently required and called 
for along the Mobile Services Value Chain. Similarly, a requirement for an additional Smart Card 
reader was not commonly agreed upon from all players in the Value Chain. 
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22.5 Present Secure Element Technologies: UICCs and Stickers 


22.5.1 The Universal Integrated Circuit Card (UICC) and the 
Smart Card Web Server 


In second generation mobile Networks (2G), the SIM card was the physical Smart 
Card used to control access of mobile devices to the MNO network. In third genera- 
tion networks (3G), this physical component is called UICC. UICCs use Java-based 
Operating Systems and they increasingly include additional Applications such as 
information-on-demand menus, SIM-based browsers, mobile banking Applications 
or ID credentials for other Mobile Services. 

For 15 years, the SIM/UICC has been an important SE in the world of mobile 
communications. It is unique in taking over the global market as the one exchange- 
able authentication token in GSM and UMTS networks. In the past few years, this 
SE has been enhanced by many more functionalities than simple user authentica- 
tion. The SIM/UICC has evolved to become a central medium for the storage and 
administration of user data in the provider network. 

Moreover, the SIM/UICC has been enabled to exchange information directly 
with the mobile phone user and the provider network using additional mechanisms 
such as the Card Application Toolkit (CAT) and Over the Air (OTA) communica- 
tion, and also interacts with web technologies such as HTML (hypertext markup lan- 
guage) pages and HTTP (hypertext transfer protocol) by leveraging the Smart Card 
Web Server (SCWS) technology. This enables the SIM/UICC to protect countless 
data such as music, video clips, purchased ring tones, and personal data on the Mo- 
bile Device. Hence, the UICC is developing to become the internet-enabled network 
node that seamlessly integrates into other IP networks. At present, it is the premier 
link that creates confidence among the mobile phone user, the network operator, and 
services providers in the operators network. 

The specific capabilities of UICCs in the context of dynamic and secure mo- 
bile services has already been analysed elsewhere. Here, advantages of UICCs for 
operator-specific, static and highly secure identification tasks have been pointed 
out and have been compared with embedded security systems and virtualisation 
(see [Pri08a]). In essence, it was shown that: 


“Smart Cards and Tokens provide high security in a mobile and flexible man- 
ner. Embedded Security Mechanisms and Virtualization may provide signif- 
icant processing power for security relevant applications” [Pri08a]. Also, a 
combination of these two, still independent capabilities could be combined 
into “a system which can cover all levels of security, be static as well as 
flexible and highly performing. As an example, such systems could provide 
Smart Cards or Tokens for mobile devices which can store different identi- 
ties and assist in using selected ones of these for different services such as 
payments, bookings or participation in online communities. The Embedded 
Security Mechanism would assist in decoding and processing the data stored 
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on the Smart Card or Token, thus making the overall system secure and highly 
performing.” [Pri08a] 


With regards to security, privacy and identity management, the UICC ranks as 
highly secure, but is does not consciously offer technologies that drive privacy pro- 
tection and enable partial identities for services outside the direct identification to- 
wards the mobile network. In fact, the most important role of the UICC remains 
what it was initially conceptualised for: identifying the individual user to one ser- 
vice partner — the Mobile Network Operator. 

At present, the identity that the UICC provides is not yet intensively leveraged for 
other Mobile Services in order to provide their additional offerings to the end-user. 


22.5.2 The Sticker as Example for Static Mobile Service Identities 


“Stickers” are self-adhesive contactless cards or tags that can be stuck on the back of 
Mobile Devices. Although being very similar to a standard contactless Smart Card, 
they have a specifically designed antenna combined with a ferrite backing layer to 
cut distortion to and from the phones components and its radio signal. With this 
antenna, Stickers connect to Near Field Communication (NFC) terminals to enable 
NFC payments. 

Currently, there are two forms of Stickers: Passive Stickers’, which are not con- 
nected to the Handsets Application execution environment, i.e., the Operating Sys- 
tem (OS), and Active Stickers, which are connected to the OS, for example via 
Bluetooth. 

Passive Stickers are widely available at present® and, for example, serve as em- 
powering technologies for the payment of small amounts, just as a debit or credit 
card would do.? Active Stickers!® that would allow more flexibility in the provision- 
ing and adaptation of partial identities are being tested for market introduction and 
mass-market availability is expected in the course of 2010-11. 

With regards to security, privacy and identity management, Passive Stickers pro- 
vide one set of identification data (e.g., a debit/credit card number) that is inflexible 
and serves the need for privacy in the same manner as a normal credit card would. 


7 “Passive Stickers” have no connection to the Operating System of the mobile device. Therefore, 
they neither allow dynamic Application management, be it by a TSM for Application updates or 
by the consumer for additional services via a phones user interface, nor do they offer the full NFC 
use case range or the provisioning of multiple application after they are distributed. 

8 Passive stickers have been mass-produced in millions of units since Q1 2009 for payment and 
loyalty Applications. 


° For practical examples, see, e.g., www. blingnation.com. 


10 “Active Sticker” are connected to the handset application execution environment, for example, 
via a Bluetooth connection. Hence, they would be able to offer flexibility regarding the flexible 
adoption of different identities. OTA provisioning and life cycle management by a TSM is possible 
for Active Stickers because of their connection to the phone. The end customer may also manage 
his/her MFS Applications via the phones user interface. 
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Active Stickers are an emerging technology that may be capable of offering flexi- 
bility with regards to the adaptation of partial identities in the future, depending on 
successful market trials. 


22.6 Emerging Secure Element Technologies: Trusted Execution 
Environments and the Privacy Challenge 


The appeal of SEs will be particularly high to the respective service providers if 
they rapidly, easily and seamlessly integrate with applications provided by Third 
Parties in the market place. Quick diffusion can be expected, if the Secure Elements 
in the Front-End enable these Third Parties to embed security, privacy and identity- 
management into their solutions ad hoc. Further, a pre-certification of the Secure 
Elements with regard to security, privacy and identity-management may still en- 
hance market acceptance, as it would provide independent solution and application 
providers with a “dock-on” method to security, privacy and identity-management. 
To achieve this goal, clear and open interfaces will be essential. Further, a combina- 
tion of hardware, software, interfaces and protocols needs to inter-play in order to 
enable the secure storage and usage of credentials for increasingly sophisticated Mo- 
bile Services. For this, technologies such as the ARM TrustZone may be leveraged, 
because of their dominant design in the marketplace for Mobile Device platforms. 

The so-called Trusted Execution Environments (TEEs) are striving to provide the 
above-mentioned characteristics. 

In order to remain highly flexible and adaptive to changes in the environment of 
Mobile Services, TEEs strive for independency from the Rich-OS. This is partic- 
ularly important as increasingly open Rich-OS systems diffuse in the Mobile De- 
vices, e.g., Googles Android. Modern TEE approaches can be used on a wide range 
of TrustZone systems, especially if they are equipped with a clean and easy to under- 
stand integration interface. Here, reference drivers can be leveraged that help TEEs 
integrate with specific Operating Systems, such as Googles Android. 

TEEs provide security, privacy and identity-management solutions that enable 
new types of services. TEEs address the need for flexible, powerful and efficient se- 
curity solutions in various forms of Mobile Devices. Among others, TEEs can, for 
example, be based on ARM TrustZone enabled chipsets (so called SoCs). TEEs 
utilise ARM TrustZones division of the SoC into two distinct areas, a “Public 
World” and a “Private World” as shown in Figure 22.3. TEEs then provide open 
interfaces in order to enable the development of dedicated applications with secu- 
rity, privacy and identity-management capabilities. 

In this concept of “Public and Private Worlds,” the TEEs encapsulate security-, 
privacy and identity-management-relevant parts of an application in the dedicated 
“Private World.” Those parts of the application that are not security-, privacy- or 
identity-management-relevant remain in the “Public World.” 

Two clear and open interfaces between the “Public” and the “Private World” — the 
so called TEE Client Application Protocol Interface (API) and the TEE Internal API 
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- enable application providers to dock-on to the concept. Leveraging these two API 
empowers them to offer secure services to the market without having to go into the 
details of security and privacy protection or the specifics of identity-management. 

The TEE Client API and the TEE Internal API follow a lightweight approach, 
meaning they are easy to use and easy to understand. Hence, developers can con- 
centrate on the design of their business logics. TEEs are also integrated into different 
SoCs in order to diffuse quickly to the different Mobile Devices. In short, these two 
interfaces and the concept of TEEs offer a highly modularised, non-complex and 
easy to use Secure Element on Mobile Devices, which empowers rapid deployment 
and constant adaptation of security-, privacy- and identity-management-enhanced 
solutions for e.g., Mobile Phones, Netbooks, Tablet PCs or even Car navigation and 
entertainment systems. 


Security-, privacy- and identity 
Application management-enhanced Application 
Modules 


TEE Client API TEE Internal API 


TEE Driver 


Integration Layer 


ARM TrustZone enabled SoC 


Fig. 22.3: Overview of the “Public” and “Private World” and the Interfaces in TEEs. 


In order to complement the existing Secure Elements in an adequate way, TEEs 
are characterised by four points: High performance, low footprint, provable security 
and certifiability. 

Looking further into modern applications on Mobile Devices and their value for 
the individual User (e.g., mobile banking applications, mobile social networking 
and mobile loyalty programs etc.), protecting these interactions and assuring the 
adequacy of the information exchanged via Mobile Devices becomes increasingly 
clear. At present, two security gaps remain for Mobile Devices, especially Mobile 
Handsets: 


e The input of data in a trusted manner (e.g., without interference between the act 
of typing on the keyboard or the touchscreen), and 
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e the output of data in a trusted manner (e.g., without the display of manipulated 
data over the screen of Mobile Devices), see Figure 22.4. 


Eyes read display: 
Trusted Output 
Input + Output 

= Secure User Dialogue 


Fingers type 
information: 
Trusted Input 


Fig. 22.4: Trusted Input + Output = Secure User Dialog. 


TEEs can provide a Secure User Dialogue by assuring that any input will be 
transmitted in a secure way via the “Private World.” Also, TEEs can ensure that 
the keypad is disconnected from the “Public World” while the individual is reading 
trusted information on the screen. However, the usage of the secure keypad functions 
does not yet prevent an attacker from writing a “Public World” sniffer that could 
grab keypad data that is intended for the “Private World.” Therefore, a secure keypad 
application needs to be combined with a secret that is stored in the “Private World.” 
Nevertheless, such solutions assist in providing the individual with input and output 
in an even more secure manner than all other existing SEs at present, and also a 
privacy- and identity-management-enhanced one. 


22.7 Technologies for Secure and Dynamic Mobile Services and 
the Privacy Challenge in Highly Dynamic Environments 


Based on the above examples of Secure Element technologies, it becomes evident 
that static technologies such as the Passive Stickers provide a Secure Element for se- 
lected Mobile Services (e.g., NFC functionalities of Credit Card payments), but do 
not correspond to an environment that would call for a highly flexible provisioning 
of partial identities. As Passive Stickers, they have one or a set of preinstalled iden- 
tities (e.g., a credit card number), but cannot be provided with new, partial identities 
over-the-air in a flexible manner. 

Further, privacy is only partially provided, as the Passive Sticker would interact 
with a terminal, which would then route the communication through an additional 
network. Hence, there is no “private” end-to-end communication between the in- 
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dividual and the recipient of the message (e.g., a payment via Credit Card), but 
processing networks are involved in “getting the message across.” 

In between the Passive Stickers and the TEE, one can position a recently intro- 
duced technology called the Secure Micro SD Card. 

With secure USD cards, flexibility is given and partial identities as well as privacy 
can be assured. Here, partial identities can be provided over-the-air for different 
relationships and audiences (e.g., a partial identity for traveling with frequent flyer 
programs, another partial identity for interacting with a bank, a third partial identity 
for customer loyalty programs, et cetera). Further, these partial identities can be 
combined with unique keys at both ends of the communication channel (i.e., VPN- 
like architectures), so that a communication channel that is linked to one of the 
partial identities remains “private” because it is encrypted and can only be read by 
the counterpart for this partial identity and not the processing network in between 
(Nota bene: Also see the PrimeLife demonstrator with the PrimeLife Application 
running in a Privacy-PIN protected Private World on a SD Card, encrypting and 
decrypting messages to a specific head hunter account). 

This could, for example, be used for the provisioning of “private” health informa- 
tion from an insurance company to a patient via mobile phones, which is impossible 
today in the U.S. because privacy cannot yet be assured. 
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Fig. 22.5: Dynamics, Security and Privacy relating to the analysed technologies. 
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For TEEs, most categories are fulfilled in the same manner as for the SD cards. 
In addition, TEEs can provide a direct link to the hardware, e.g., a mobile phone and 
its keypad and display, and can thus assist in even making the input and output of 
information trustworthy, private and identity-related (see previous section for more 
details). 

For example, a secure User Interface (UI), which includes a secure display and 
keypad, will assure that the input, e.g., an amount for a money transfer or the need 
for a new medicine prescription, can only be read by the respective Application 
that is linked to the partial identity (e.g., a bank account or pharmacy), and is then 
encrypted and wired through the network to the recipient in a privacy-enhanced 
manner. 

For all of these technologies, however, anonymity as an additional building block 
for enhanced privacy remains an open issue: For example, Mobile Devices have 
unique identities in the networks over which they communicate, provided by the 
log-on of the SIM card (i.e., the subscriber identification module) to the networks, 
based on the corresponding Personal Identification Number. Hence, the network 
can identify the individual Mobile Device. Further, the usage profile correlated to 
the respective device provides further insight into the individuals preferences and 
interests. Thus, although the above-mentioned technologies can fully or partially 
provide trusted, identity-related and private means of communication an interaction, 
the individual device, and therefore also its user, is not anonymous. 

Therefore, existing and emerging Secure Element technologies already provide a 
valid development in the direction of secure, identity-enabled and privacy-enhanced 
Mobile Services. 

However, certain privacy challenges remain. These should be the future direc- 
tions of innovation in the area of Smart Private Mobile Devices and Services. 


22.8 Contributions of the PrimeLife Project for the 
Advancement of Technologies in the Field 


The PrimeLife project advanced the technology for identity management and pri- 
vacy on mobile devices in the following areas: 

Firstly, it conceptualised how infrastructures for Mobile Services of the fu- 
ture need to be structured in order to allow for highly dynamic Mobile Services 
(see [Pri08a]). Here, it was found that the existing technologies do a great job for 
rather static Mobile Services (also see section on UICC and Stickers above) but 
solutions for more flexible environments are missing. 

Subsequently, G&D developed the front-end for a PrimeLife demonstrator and 
used a secure USD card as a more flexible and open Secure Element for this [Pri08b]. 

Applying the scenario of the eCV, the demonstrator created the following, highly 
dynamic Mobile Service as an interaction between the Mobile Device, a “Private 
World” embedded into the secure SD card, and the server back-end [Pri08b, pp.24- 
25): 
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e “A portal (i.e., a back-end server) manages potential privacy and identity man- 
agement related conflicts and sends requests that contain conflicts to the “Private 
World” of a users Mobile Device.” 

e “The user then decides whether he/she wishes to use the requesting service or 
not, based on logging in to her/his “Private World” (via the Privacy-PIN) in the 
Secure Element (e.g., Secure “SD Card or TEE) on the Mobile Device.” 

e “The “Private World” on the Mobile Device can “freeze” the Back-end if the end 
user/job applicant does not accept a policy mismatch.” 

e “The “Private World” on the Mobile Device can deactivate the data set on the 
Back-end for a selected time period, e.g., if the job applicant does not want his 
private data to be visible for others online because he does not want to be ap- 
proached by any job offering entity.” 

e “The “Private World” on the Mobile Device can directly interact with the Back- 
end in a secure manner (e.g., via a Virtual Private Network or via encrypted 
communication) for data control in the future.” 

e “The “Private World” on the Mobile Device holds the essential service keys for 
numerous privacy- and identity-management enhanced services and is therefore 
the privacy and identity-controlling device in the palm of each end consumer.” 

e “The “Private World” on the Mobile Device provides a secure compartment/the 
TEE in which customisable services are empowered and in which additional data 
can be stored, e.g., additional certificates to enhance the eCV even more in se- 
lected cases.” 

e “The “Private World” on the Mobile Device can “glue” other Secure Elements 
such as the SIM, the SD card and others together, if these are needed as sources of 
partial identities to provide more complete identity sets for particular services.” 


The PrimeLife demonstrator therefore enabled more flexible Mobile Services, 
distributed between the SE on the front-end Mobile Device and the back-end Server. 

Nevertheless, a highly dynamic composition of Mobile Services was only par- 
tially possible and additional features of privacy (e.g., a secure User Interface, as 
shown in Figure 22.4) remained an open issue. Further, the secure USD card as se- 
lected SE was not available in all mobile devices and was rather a plug-in solution. 

Hence, G&D focused on the next level of SEs that would enable not only flexible, 
but highly dynamic services: Trusted Execution Environments [Pri08b]. Here, G&D 
provided and assured the standardisation of the TEE Client API, which enables 
the exchange of data between the rich OS Applications of Mobile Devices and the 
security- and privacy-enhanced part (i.e., the “Private World”) of the Application. 
Based on the standardisation in the Global Platform consortium and an open TEE 
Client API, developers can now define which data shall be interchanged between 
the “Public” and the “Private World,’ and in which manner. They can do so for 
the Mobile Services of any stakeholder along the value chain and can also provide 
their applications to the Secure Element in a highly flexible manner by using over- 
the-air service providers that “load” the identity-management-enabled and privacy 
protected applications into the Mobile Devices in the field. 

Through the design of this interface and its standardisation in alignment with the 
PrimeLife project, G&D empowered the Mobile Services environment with access 
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Fig. 22.6: Dynamic Mobile Services in the PrimeLife Demonstrator. 


to a highly modularised, simple and easy-to-use Secure Element, which empow- 
ers rapid deployment and constant adaptation of security-, privacy- and identity- 
management-enhanced solutions for e.g., Mobile Phones, Netbooks, Tablet PCs or 
even cars. 


22.9 The Privacy Challenge in Mobile Services and Future 
Directions for Innovation 


The above-elaborated status quo of the Mobile Services ecosystem and the Secure 
Element technologies has drawn an up-to-date picture of the technical capabilities 
and has shown which issues in the area of security, identity and privacy are presently 
being solved, amongst others via the PrimeLife project. 

However, selected areas remain unsolved. They can be summarised in the fol- 
lowing roadmap for further innovation in the field: 


e The requirement for highly dynamic adaptation to the ever-changing market and 
technology environments can be tailored for through the different SE alternatives. 
The less dynamic the SEs need to be, the more one can expect that existing so- 
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lutions will be leveraged. The more dynamic Mobile Services shall be designed, 
the more likely emerging SEs such as Secure SD cards and TEEs will be used. 

e The Security / trust requirement is inherently solved by the usage of SE tech- 
nologies. 

e The requirement for numerous, partial identities is predominantly addressed by 
the emerging SEs, such as Secure USD cards and TEEs. It can be expected that 
Mobile Devices will leverage multiple SEs in the future, each hosting different 
partial identities. 

e Privacy, as in the secure communication between predefined communication 
partners that relate to the respective partial identities (e.g., secure one-to-one 
communication between the end user and the eCV portal, between the end user 
and a banking/payment entity, or between the end user and a selected loyalty 
program provider such as an airline), can be assured through the emerging SE 
technologies. 

e Anonymity, as in the in the unlinkablilty of the end user to its respective actions, 
remains an open issue. Here, technologies such as IDMix and Direct Anony- 
mous Attestation can prove very valuable if they were combined with the above- 
mentioned SEs. These could counteract the present unique identity of Mobile 
Devices (e.g., via their Subscriber Identity) and the linkability of this identity to 
the usage profile in Applications that still largely reside in the “Public” instead 
of the “Private” world. 

e Further, TEEs are not yet as strictly isolated as, for example, UICC/SIM cards 
and their certification for the different use cases still needs to be provided. In 
addition to the open question of anonymity, this needs to be addressed through 
future research. 


In essence, the existing and emerging technologies are a significant step into the 
direction of providing more (partial) identity-related and privacy-empowered Mo- 
bile Services. Nevertheless, open points such as anonymity, certification and isola- 
tion remain to be solved in future directions of innovation. 
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Fig. 22.7: Remaining privacy challenges and future directions for innovation. 
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Abstract Telcos face an elementary change in their traditional business model. The 
reasons for this are manifold: Tougher regulations, new technology (most notably 
VoIP and open spectrum), matured core business markets (voice and messaging), 
new market entrants or advancing customer demands and expectations. A potential 
direction of this change is business models that concentrate on the exploitation and 
monetisation of the huge amount of customer data that results from the usage of 
traditional communication services (data, voice). Based on these data, telcos’ long- 
standing relationships to their customers, and infrastructural assets and capabili- 
ties, telcos are a reasonable candidate for assuming the role of identity management 
service providers (IdMSPs). This chapter describes a method to evaluate privacy- 
enhancing IdM Services from the perspective of a telco acting as prospective IdM 
Service Provider. The basis for the evaluation method is formed by the concept of 
Identity Management Enablers, which are used to analyse and describe the services 
and scenarios on which the decision supporting method is based on. 


23.1 Introduction 


Telcos face an elementary change in their traditional business model. The reasons 
for this are manifold: Tougher regulations, new technology (most notably VoIP and 
open spectrum), matured core business markets (voice and messaging), new market 
entrants or advancing customer demands and expectations.! Telcos are forced into 
decision-making about new business models. A potential direction of this change is 
business models that concentrate on the exploitation and monetisation of the huge 
amount of customer data that results from the usage of traditional communication 
services (data, voice). One out of several potential future business models will be the 
provision of identity management services to third-party service providers. Based 
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on these data, Telcos’ longstanding relationships to their customers, and infrastruc- 
tural assets and capabilities, Telcos are a reasonable candidate for assuming the role 
of identity management service providers (IdMSPs). For these reasons, we are fo- 
cussing on Telcos in the role of IdMSPs instead of other potential Internet-based 
service providers (e.g., Facebook, Google, Amazon). But like other organisations, 
Telcos have concerns about the economic motivations to invest in privacy-enhancing 
identity management services [FaRi08]. This chapter describes a method for the 
construction and application of a decision support approach that can be applied to 
support the decision-making process of Telcos in order to decide on investing into 
the provision of privacy-enhancing identity management services. The method has 
seven steps. Some of them are structured following established economic methods. 
The ongoing work described in the following sections is an initial design of this 
method. 

Section 23.2 introduces the IdM enabler concept and provides a step by step 
description of the method, where each step is followed by an illustrative use case 
example. Section 23.3 describes further use case examples for the method. Sec- 
tion 23.4 gives an overview of related approaches in this area. Finally, Section 23.5 
briefly discusses the benefits and limitations of the method and gives an outlook on 
further potential developments. 


23.2 Economic Valuation Approach for Telco-Based Identity 
Management Enablers 


This chapter introduces a decision support approach that can be applied by Telcos in 
order to decide whether to invest in the provision of privacy-enhancing identity man- 
agement services. The basis for the evaluation method described below is formed 
by the concept of Identity Management Enablers. Identity Management Enablers 
are used to analyse and describe the services and scenarios on which the decision 
supporting method is based. They consist of a valuable combination of IdM related 
customer data assets” and functional capabilities. Data assets in this context are at- 
tributes of a user identity (e.g., end customers) such as name, place of birth, account 
details, and so forth. Functional capabilities are functions that process these data 
assets to provide IdM services (e.g., age verification, authentication). A combina- 
tion of IdM related functional capabilities and identity related data assets is further 
called an IdM Enabler and should be seen as a driver for a specific IdM Service (see 
Figure 23.1). 

The evaluation approach can be used as a decision support instrument for po- 
tential providers of privacy-enhancing IdM services and consulting agencies acting 
in this domain. It is focused on decision situations, where an IdM service provider 
(IdMSP) has to decide 


e between investing in a privacy-enhancing IdM Service or not, 


2 Data collected due to the provision of traditional communication services. 
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e investing in which one of at least two alternative privacy-enhancing IdM services. 


& i 
>. 
\ Service . Service Z 
/ Request / _/ Provision , 


IdM Functional Capabilities Customer Data Assets 


Verification Birthdate 


YES — Verified Age >= 21!“ 


Fig. 23.1: The IdM Enabler Concept. 


The method consists of the following seven process steps. These will be de- 


scribed in more detail in the subsequent sections: 


Description of the baseline option and feasible delta options by scenarios. 
Identification of each stakeholder’s costs and benefits. 

Selection of each stakeholder’s key costs and benefits. 

Mapping of each stakeholder’s key cost and benefits to the IdMSP by cause-effect 
chains. 

Clustering IdMSP’s costs and benefits. 

Assessment and aggregation of IdMSP’s clustered costs and benefits. 
Visualisation of IAMSP’s aggregated cost and benefits. 


To demonstrate each step of the approach in a more pragmatic and less abstract 


way, they will be consistently applied to the following exemplary use case: An 
IdMSP has to decide whether to invest in the provision of a new privacy-enhancing 
age verification service for end customers of online casino providers. 
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23.2.1 Description of the Baseline Option and Feasible Delta 
Options 


In the first step, the IAMSP describes the status quo of the examined identity man- 
agement service. This mainly comprises a description of how a specific identity 
management service is currently implemented in practice by other service providers. 
This status quo scenario is called the Baseline Scenario (BS). Thus, the Baseline Op- 
tion (BO) represents the alternative not to provide any of the available IdM services 
at all, and needs to be considered as one possible decision. 

After the description of the Baseline Scenario, the IdMSP needs to describe all 
alternative implementation scenarios of the IdM service that shall be considered 
to enhance the Baseline Scenario. These alternative scenarios are here called the 
Delta Scenarios (DS). Further, these Delta Scenarios must be mutually exclusive. 
Analogous to the Baseline Option, a Delta Option (DO) represents the alternative to 
invest in one of the Delta Scenarios. 


Use Case - Age Verification Scenario 


In this example, the decision maker (the IdMSP) identified two alternative designs 
for an enhanced age verification service: Delta Scenario 1 (DS 1) and Delta Scenario 
2 (DS 2). In this case, the IdMSP has the following options to act in the identity 
management ecosystem: Invest in DO 1, in DO 2 or do not invest (BO) at all. The 
IdMSPs decision for the BO would leave the state of the resulting environment 
unchanged as shown in Figure 23.2. 

In this example, the end customer of an online casino needs to provide the online 
casino provider with a valid proof of his age. Here, the end customer provides this 
information by, e.g., entering his date of birth into a special web form. This process 
has to be replicated for any age-based service the end customer wants to use. 

Opting for DO | would result in the modified market situation represented by 
Delta Scenario | (DS 1) (Figure 23.3). To use the age verification service, the end 
customer needs to create an account with the IdMSP and needs to provide a valid 
proof for his date of birth. This usually will involve an external age verification 
process. After being successfully registered with the IdMSP, a verified legal age 
certificate will be provided by the IdMSP. The end customer can use this certificate 
at any point in time and without the involvement and the knowledge of the IdMSP 
in order to verify its legal age to the online casino provider. The end customer can 
request additional verified legal age certificates to be presented to other age-based 
services. Alternatively, the IdMSP could issue generic, service provider independent 
credentials, e.g., in the form of an anonymous credential [CLO1]. 

Opting for DO 2 would result in a modified market situation represented by Delta 
Option 2 (Figure 23.4). 

In Delta Scenario 2 (DS 2), the end customer needs to create an account with 
the IdMSP and provide a valid proof for his date of birth with the help of an ex- 


23 Privacy by Sustainable Identity Management Enablers 435 


Online Casino Provider End Customer 
I 


1. Request Service 


2. Request Date of Birth 
3. Provide Date of Birth 


1 

i} 

i} 

: 

1 

: 

1 

i] 
_ 4. Check Age 
, i] 
' 


5. Grant Service Access 


Fig. 23.2: Age Verification Baseline Scenario. 
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Fig. 23.3: Age Verification Delta Scenario 1. 
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Fig. 23.4: Age Verification Delta Scenario 2. 


ternal age verification process. After successful registration, when the end customer 
wants to use the online casino service, he provides the online casino provider a ref- 
erence to his age verification provider. The online casino provider then requests the 
IdMSP for verified age information. Thus, the end customer is not involved to the 
age verification process. 


23.2.2 Identification of each Stakeholder’s Costs and Benefits 
Based on Delta Scenarios in Comparison to the Baseline 
Scenario 


The anticipated impacts and the expected costs and benefits of a specific scenario are 
crucial factors for decision making. Therefore, the corresponding costs and benefits 
need to be identified for all delta scenarios. During this step, the Baseline Scenario 
has to be taken as the reference value (the baseline). That allows for the prediction 
and evaluation of the consequences of the Delta Scenarios in the form of costs and 
benefits. This step of the method can be performed by experts or by the usage of 
an appropriate explanatory model. As the costs and benefits of the IdMSP partially 
depend on the costs and benefits of the other market players (service providers, end 
customers), these have also to be anticipated and evaluated in this step (Figure 23.5 
and 23.6). 
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Fig. 23.5: Cost-Benefit list for Delta Option | vs. Baseline Option. 
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Fig. 23.6: Cost-Benefit list for Delta Option 2 vs. Baseline Option. 
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Delta Option 1 vs. Baseline Option 
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Fig. 23.7: Key Costs and Benefits for Delta Option | vs. Baseline Option. 
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Fig. 23.8: Key Costs and Benefits for Delta Option 2 vs. Baseline Option. 
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Use Case - Age Verification Scenario 


The identified costs and benefits are described in a way that they express the ex- 
pected economic changes that result from introducing a Delta Scenario. Each iden- 
tified cost and benefit has an influence on the overall benefit that is expected. 


23.2.3 Selection of Key Costs and Benefits for each Stakeholder 


To reduce the overall complexity, in this step, the IdMSP has to reduce the set of 
costs and benefits to a subset of key costs and key benefits for each stakeholder. The 
IdMSP excludes all costs and benefits he does not consider relevant for its decision 
making.° As the following steps of the method (steps 4 - 7) are based on this reduced 
subset of costs and benefits, the selection of key costs and key benefits is crucial for 
the overall result. 


Use Case - Age Verification Scenario 


Figure 23.7 and Figure 23.8 show the result of the exemplary application of this step 
of the method. Costs and benefits that are considered as less relevant are crossed out. 


23.2.4 Mapping of each Stakeholder’s Key Cost and Benefits on 
IdM Service Provider by Cause-Effect Chains 


The key costs and benefits identified for each stakeholder in a Delta Scenario have 
to be mapped to the IdMSP by cause-effect chains. The central idea of cause-effect 
chains is to create a model of the resulting cost and benefits, which particularly 
considers their interdependencies. A single cost or benefit of a market player causes 
economic effects on the respective market player itself and on all other players of 
that ecosystem. The aim of the cause-effect chains is to let all economic effects 
of the other market players flow into the IdMSP’s costs and benefits. As a result, 
the IdMSP will get a set of mapped costs and benefits representing the economic 
consequences caused by the other market players. 


3 Note that the result of this step is highly dependent on the decision maker’s individual valuation 
of each cost and benefit. 
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Use Case - Age Verification Scenario 


All key costs and benefits derived in Step 3 (Section 23.2.3) will now be mapped 
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step by step to the I(MSP (Figure 23.9 and Figure 23.10): 


e Mapping end customer’s key costs and key benefits to other costs and benefits of 


the end customer. 


e Mapping end customer’s costs and benefits to costs and benefits of the service 


provider. 


e Mapping end customer’s and service provider’s costs and benefits to costs and 


benefits of IdMSP. 


Delta Option 1 vs. Baseline Option 


End Customer Service Provider aM Service Provi 


ider 
Benefits Costs Benefits 


Costs Benefits Costs 


Additional 
efforts for 
Hardware 
and/or 
Soflware 


Additional 
registration 
fees and/or 
charges for 


service usage 


Less 
information 
about End 
Customers 


Fewer 


‘Additional possibilities for 
Data commerciatisat 
Minimisation jon of user 
(more Privacy) data 


Less potential 
for advertising, 
personalisation 


Fewer 
revenues. 


Additional 
efforts for 
development 
and operation 
of hardware 
and/or 
software for 
End Customers 


Additional 


efforts for 
development 
and operation 
of the Service 
Infrastructure 
(data bases, 
ate.) 


Fewer new End 
Customers 


Fig. 23.9: Cause-Effect-Chain for Delta Option | vs. Baseline Option. 


The last two columns in the tables in Figure 23.9 and Figure 23.10 show the results 


of this step. 


23.2.5 Clustering of Mapped IdM Service Provider Costs and 


Benefits 


After mapping all costs and benefits to the IdMSP, Step 4 will usually result in 
a large set of different costs and benefits with a variety of scale units. To reduce 
complexity and ease the process, clustering by equal scale units or dimensions such 
as possible) set of decision-relevant factors, 


as revenues, costs, or risks, to a (as small 
is needed. 
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Delta Option 2 vs. Baseline Option 


cons eves consents TESTE 


Less Privacy. er 
because of r 
additional 

knowledge of 


IdMSP about Less 
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Providers and _———— 
additional Fewer 


knowledge of possibilities for 
commercialisat 
jon of user 
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Less potential Additional 
for advertising, eftorts for 
- personalisation development 
Less Privacy, » profiling, and operation 
Rua targeting ete. of Service 
control about Provider 
ee sae A interface 


Additional 
efforts for 


Reduced Fewer new End development 


motivation to Customers 
sign up for the Infrastructure 
service {data bases, 
etc.) 


Fig. 23.10: Cause-Effect-Chain for Delta Option 2 vs. Baseline Option. 


Delta Option 1 vs. Baseline Option 


IdM Service Provider Effects on 
Costs Benefits ane: ahi 


Fewer End 
Customers 


Less End 
Customer 
loyalty 


Clustering of IdM Service 
Provider Costs and 
Benefits: 
Pas * Costs and Benefits 
development grouped by similar scale 
and operation units or by critical success 
of hardware factors relevant for the 
and/or > Higher achievement of individual 
software for objectives of the decision 
End Customers maker 
Additional * identification of positive 
efforts for and/or negative effects on 
development each dimension 
and operation 
of the Service 
Infrastructure 


(data bases, 
etc.) 


Fig. 23.11: Clustering Costs and Benefits of Delta Option 1 vs. Baseline Option. 


Use Case - Age Verification Scenario 


The clustering of all costs and benefits by similar scale units or by critical success 
factors that are relevant for the achievement of IdMSPs individual objectives, re- 
sults in a set of costs and benefits that is easier to handle. With this clustering, the 
effects of a group of similar costs or benefits will be represented by a single effect 
in an aggregated form (see Figure 23.11 and Figure 23.12). For example, more new 
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Delta Option 2 vs. Baseline Option 


Dimen 


Fewer Service 
Providers 


Less Service 
Provider 
loyalty 


Additional 
efforts for 
development 
and operation 
of Service 
Provider 
interface 


Additional 
efforts for 
development 
and operation 
of the Service 
Infrastructure 
(data bases, 
etc.) 


risks 


IdM Service Provider Effects on 


g Higher 
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Clustering of IdM Service 
Provider Costs and 
Benefits: 

+ Costs and Benefits 
grouped by similar scale 
units or by critical success 
factors relevant for the 
achievement of individual 
objectives of the decision 
maker 

+ identification of positive 
and/or negative effects on 
each dimension 


Fig. 23.12: Clustering Costs and Benefits of Delta Option 2 vs. Baseline Option. 


Effects on Dimensions Dimensions 


+ More revenues (high) 
- Fewer revenues (low) 


= More revenues (medium) 


- Higher costs (low) 


= Higher costs (low) 


+ Lower risks (low) 


= Lower risks (low) 


Higher costs 
(low) 


Assessment and aggregation 
of IdM Service Provider 
Costs and Benefits: 


* grouped Costs and Benefits 
assessed and aggregated on 
the decision maker’s 
dimensions 

+ initial point for deducing a 
decision based on the 
preferences of the decision 
maker 


Fig. 23.13: Aggregating Costs and Benefits of Delta Option | vs. Baseline Option. 


service providers and a higher degree of service provider loyalty will result in more 


revenue. 
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Delta Option 2 vs. Baseline Option 


Effects on Dimensions | Dimensions | 


+ More revenues (high) 
- Fewer revenues (low) 


Assessment and aggregation 
of IdM Service Provider 
Costs and Benefits: 


* grouped Costs and Benefits 


= More revenues (medium) 


- Higher costs (low) 


Higher costs assessed and aggregated on 
(low) the decision maker’s 
= Higher costs (low) dimensions 


+ initial point for deducing a 
decision based on the 
preferences of the decision 
maker 


+ Lower risks (medium) 


= Lower risks (medium) 


Fig. 23.14: Aggregating Costs and Benefits of Delta Option 2 vs. Baseline Option. 


Delta ‘ ‘ ‘ ™ 
Value Delta Option 2 vs. Baseline Option 


High 


“_ Visualisation of IdM Service 
Low - Provider Costs and Benefits: 
Baseline * aggregated costs and 


Delta of Revenues vel ts Delta of Risks benefits wmssaties on the 
Low decision maker’s dimensions 


Medium r— + initial point for deriving a 
é decision 
igh We 
8 | * decision maker needs to 
Y know its preferences for 


each dimension in order to 
Delta Option 2 vs. Baseline Option: deduce a decision 


+ Medium positive value from increased revenues 
* Low negative value from increased costs 
+ Medium positive value from decreased risks 


Fig. 23.15: Visualising Costs and Benefits of Delta Option | vs. Baseline Option. 


23.2.6 Assessment and Aggregation of Clustered IdM Service 
Provider costs and Benefits 


The effects resulting from the cost and benefit clustering in Step 5 (Section 23.2.5) 
can be positive or negative and of different importance to the IdMSP. Therefore the 
effects need to be aggregated to an overall effect for each of the chosen dimensions. 
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Delta Delta Option 2 vs. Delta Option 1 
Value 


High 


Medium ~ 


Baseline ~ 


|Delta of Revenues Delta of Costs Delta of Risks 


Low 


Medium 


High — = 


¥ 
Delta Option 2 vs. Delta Option 1: 
« Low positive value from increased revenues 
* Low positive value from decreased costs 
* Low positive value from decreased risks 


Visualisation of IdM Service 
Provider Costs and Benefits: 
* aggregated costs and 
benefits visualised on the 
decision maker’s dimensions 
+ initial point for deriving a 
decision 

* decision maker needs to 
know its preferences for 
each dimension in order to 
deduce a decision 


Fig. 23.16: Visualising Costs and Benefits of Delta Option 2 vs. Baseline Option. 


Delta Delta Option 2 vs. Delta Option 1 
Value 


High 
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Baseline 
Delta of Risks 


Delta of Revenues Delta of Costs 


Low 


Medium 


High 


Y 
Delta Option 2 vs. Delta Option 1: 
+ Low positive value from increased revenues 
* Low positive value from decreased costs 
+ Low positive value from decreased risks 


Visualization of IdM Service 
Provider Costs and Benefits: 


* aggregated costs and 
benefits visualized on the 
decision maker’s dimensions 
+ initial point for deriving a 
decision 

+ decision maker needs to 
know its preferences for 
each dimension in order to 
deduce a decision 


Fig. 23.17: Delta Option 2 vs. Delta Option 1. 


During the aggregation, each effect needs to be individually weighted by the IdMSP. 
Where applicable, this can be done by adding concrete values or ranges of values 
for each effect, but usually the aggregation will be based on appropriate scales or 
grades defined by experts of the IdMSP, such as very good (+ +), good (+), medium 
(0), bad (-), and very bad (- -). 
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Use Case - Age Verification Scenario 


Based on the results of Step 4 and Step 5, the IdMSP now assesses the intensity of 
each dimension influencing effect by using the abstract value classes high negative (- 
- -), medium negative (- -), low negative (-), Baseline (0), low positive (+), medium 
positive (+ +), and high positive (+ + +). For example, for the comparison between 
the DS 1 and the BS (Figure 23.13), the IdMSP rates the effect more revenues as 
medium positive and the effect fewer revenues as low negative. In the end, the IdMSP 
expects the effect more revenues to have low positive intensity, when it opts for DO 
1. Based on the IdMSP’s preferences for each dimension and the results shown in 
Figure 23.13, the IAMSP can now deduce the decision whether or not it should 
provide its age verification service as represented by DO 1. 


23.2.7 Visualisation of Aggregated IdM Service Provider Costs and 
Benefits 


Finally, the aggregated costs and benefits will be visualised in order to further sim- 
plify complex decision situations to support the IdMSP. 


Use Case - Age Verification Scenario 


Visualisation example (see Figure 23.15 and Figure 23.16): Based on the results of 
Step 6 (23.2.6), the IdMSP should also valuate the relative advantages of its DOs as 
shown in Figure 23.17. 


23.3 Description of the Identity Management Scenarios 


The method presented in Section 23.2 can be applied to a variety of identity man- 
agement services. Step 1 of the method requires the IdMSP to describe the Baseline 
Scenario and the Delta Scenarios. In Section 23.1, a first identity management en- 
abler scenario with its baseline and delta scenarios have been presented (age verifi- 
cation). To give more examples for the design of Baseline and Delta Scenarios, this 
section presents two additional identity management enabler scenarios (authentica- 
tion, privacy policy enforcement) that can be evaluated with the method. For both 
scenarios, the Baseline Scenario and two Delta Scenarios are presented. 
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23.3.1 Authentication 


Authentication is an essential identity management function in online and offline 
scenarios. Due to its nature, implementations of authentication mechanisms have to 
be reliable and secure. Therefore, different authentication types exist for different 
scenarios. In the following, three possible authentication designs are presented, the 
baseline option and two possible delta options. 


Online Casino Provider End Customer 
! 


1, Request Service 


2. Request Authentication 


3. Response Authentication (Username, Password) 


> 4. Process Authentication 


5. Grant Service Access 


Fig. 23.18: Authentication Baseline Option. 


23.3.1.1 Baseline Option - End Customer Provides Username & Password 


The baseline option illustrated in Figure 23.18 represents the most commonly used 
authentication scheme in online scenarios. Before a session starts, the end cus- 
tomer provides his username (pseudonym) together with the password to the ser- 
vice provider. The service provider then authenticates the user. Though, the service 
is enabled by the user providing the authentication credentials (IdM data asset) and 
the service provider processing the authentication (IdM functional capability). 
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23.3.1.2 Delta Option 1 - Telco Forwards Authentication Code to End 
Customer 


A more sophisticated authentication scheme is Delta Option 1, illustrated in Fig- 
ure 23.19. The most known implementations of this multi-factor authentication 
scheme can be found in online banking scenarios (mobile TAN, smsTAN, mTAN). 
In Delta Option 1, the Telco is the trusted third-party IdMSP. In the first step, af- 
ter the end customer requests the service, the service provider requests the Telco to 
forward an authentication code to the mobile phone of the end customer (1). The 
authentication code is generated randomly on-the-fly and is valid for a short time 
frame. After the end customer receives this authentication code (2), he presents this 
second authentication credential to the service provider (the first authentication step 
was providing a username and password combination to the SP). 

This authentication scheme requires that the service provider and the Telco ne- 
gotiate a commonly used identifier for the respective user in the initial registra- 
tion phase. Here, the essential IdM data asset (mobile phone number of the end 
customer) comes from the Telco, the IdM functional capability is on the service 
provider’s side (processing the authentication). This scheme follows Privacy by De- 
sign principles, since the phone number of the end customer is not shared with the 
service provider. 


23.3.1.3 Delta Option 2 - Telco Provides Authentication Data 


Delta Option 2 (Figure 23.20) is a generalised single sign-on scenario. The user 
has an (SSO-)account with the Telco. If he wants to consume a specific service, he 
authenticates to the Telco. The Telco then provides the user an authentication token, 
which the end customer then forwards to the service provider. In this scheme, the 
essential IdM functional capability (processing the authentication) is implemented 
and provided by the Telco. 


23.3.2 Privacy Policy Enforcement 


Privacy policies are an essential instrument for the end customers to express their 
privacy preferences. Policy enforcement mechanisms ensure that these policies are 
followed. This section presents three different types of policy enforcement imple- 
mentations. 


23.3.2.1 Baseline Option - Manual Policy Enforcement by the End Customer 


Figure 23.21 simply illustrates the most common approach for a user to control the 
flow of personal data. There is no actual configuration of privacy policies and no 
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' 
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' 
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8. Send Auth. Code (Auth. Code) 
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Fig. 23.19: Authentication Delta Option 1. 
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Fig. 23.20: Authentication Delta Option 2. 
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dedicated process for enforcing policies. The end customer selectively provides the 
data that he wants to share with the service provider. Obviously, this approach is not 
very flexible and scalable, but still the state of the art. 


a 


Online Casino Provider End Customer 
! 


1. Request Service 


2. Request Personal Data (List of required data, list of optional data) 


3. Provide Selected Personal Data 


4. Matching betw. 
required data and 
received data 


5. Grant Service Access 


Fig. 23.21: Privacy Policy Enforcement Baseline Option. 


23.3.2.2 Delta Option 1 - Service Provider Enforces Privacy Policy 


In Delta Option | (Figure 23.22), the user provides his personal data together with 
a privacy policy (data handling policy) to the service provider. The SP handles the 
provided data as specified in the privacy policy. This variant is applicable to scenar- 
ios where the service provider is a potential provider of identity data to third parties. 


23.3.2.3 Delta Option 2 - Policy Enforcement by the Telco 


Figure 23.23 illustrates a policy enforcement design derived from PrimeLife results 
of different work packages. There is a dedicated Policy engine at the Telco’s side 
where the end customer can create and configure (or upload) his individual privacy 
policy (1). When a service provider requests personal data from the end customer, 
the policy enforcemet point (PEP) checks this request against the user’s privacy 
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Fig. 23.22: Privacy Policy Enforcement Delta Option 1. 
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Fig. 23.23: Privacy Policy Enforcement Delta Option 2. 
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policy. In case of a mismatch, the end customer is informed about this (e.g., push 
notification to his mobile phone, 2). He then can decide whether to provide the data 
anyhow or to insist on his policy configuration (3). In the first case, the policy filtered 
data is provided to the service provider (4). In this option, both the IdM data assets 
(personal data) and the IdM functional capability is provided by the Telco. 


23.4 Related Work 


There is a considerable body of related work on economic issues of privacy- 
enhancing identity management by the project PRIME (Privacy and Identity Man- 
agement for Europe) [PRI]. This work demonstrates both the potential of privacy- 
enhancing identity management technology as a tool for ensuring and enforcing 
customers’ privacy in everyday transactions and the means by which it can be ap- 
plied by enterprises to their everyday business. 

Addressing the enterprise perspective, Fairchild and Ribbers [FR11] take up 
some existing concerns about the economic motivations for organisations to invest in 
privacy and identity management in general and explore a few intrinsic technology 
adoption drivers for enterprises. They propose the use of a cost benefit analysis by 
enterprises in order to decide whether to invest in the implementation of privacy and 
identity management technologies under consideration of expected changes such as 
costs, risks, trust, image and revenues. These are some examples of business-related 
effects that should be considered when evaluating or deciding to invest in privacy- 
enhancing identity management. 

Zibuschka, Rannenberg, and Kélsch [ZRK11] explored privacy and identity man- 
agement in a specific application domain: They provided a set of economic and 
regulatory requirements for the commercialisation of privacy-enhanced location- 
based services that could also be adapted as requirements for the commercialisation 
of more general privacy-enhancing identity management services. Commonly, in 
such service scenarios, different parties, e.g. a mobile network operator, an appli- 
cation provider and an end customer, need to interact with each other. Each stake- 
holder has different interests and requirements, assets and capabilities, as well as 
constraints and limitations. These stakeholder-depending factors influence the util- 
ity gainable by the other stakeholders in providing or consuming a service. Hence, 
the involved stakeholders, their utility-influencing factors and their interdependen- 
cies should also be considered when evaluating or deciding to invest in privacy- 
enhancing identity management services. 

K6lsch, Zibuschka, and Rannenberg [KZR11] derived features from a wide range 
of application prototype scenarios that a privacy-friendly identity management sys- 
tem should support in order to fulfill the stakeholders’ interests and requirements. 
Depending on the feature-supporting characteristics of a system, each stakeholder 
gains a different utility from providing or consuming it. Thus, also the different fea- 
ture characteristics should be considered when evaluating or deciding to invest in a 
privacy-enhancing identity management system. 
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We extend this line of related work by proposing a structured method for evalu- 
ating privacy-enhancing identity management services. 


23.5 Summary and Future Work 


We have developed a method to evaluate privacy-enhancing IdM Services from the 
perspective of a Telco acting as prospective IdM Service Provider. Some of the 
seven steps of the method are structured following established economic methods. 
The major goal of our approach is to develop a simple method with a good trade-off 
between quality of the method’s results and the effort needed in carrying out the 
work. To test our method, we compared several IdM service scenarios employing 
our valuation method and will do some more in the future. The tests showed the need 
for a very detailed and precise description of the scenarios. Making the scenarios 
more detailed and precise of course also helps to improve their understanding. 
To further develop the method the following work is planned: 


e Intensive testing of the method on real world use-cases. 

e Further examination of the economic viability of privacy-enhancing Telco based 
IdM Services. 

e Enhancement of the method based on the basic model of normative decision 
theory. 

e Enhancement and improvement of each step by more sophisticated methods and 
concepts and for more intensive focus on privacy-related effects. 

e Simplification of the applicability by predefined and selectable components for 
each step of the approach (e.g. predefined and selectable costs and benefits, 
cause-effect chain elements). 

e Reducing possible errors caused by subjectivity of the decision maker. 


References Part V 


[APSO2] Paul Ashley, Calvin Powers, and Matthias Schunter. From privacy promises to privacy 
management: a new approach for enforcing privacy throughout an enterprise. In NSPW 
*02: Proceedings of the 2002 workshop on New security paradigms, pages 43-50, New 
York, NY, USA, 2002. ACM. 

[Bacl0] C. Bachfeld, D.; Mulliner. Risiko smartphone: Spionageangriffe und abzocke auf android 
und iphone. c’t - Magazin fuer Computertechnik, 20:80-82, 2010. 

{[BDS04] Michael Backes, Markus Duermuth, and Rainer Steinwandt. An algebra for composing 
enterprise privacy policies. In Proceedings of 9th European Symposium on Research in 
Computer Security (ESORICS), volume 3193 of Lecture Notes in Computer Science, pages 
33-52. Springer, September 2004. 

[Ber09] Marc-Michael Bergfeld. Global Innovation Leadership: The strategic development of 
worldwide innovation competence. BOD, Nordersted, 2009. 

{[BHTBOS5] James Backhouse, Carol Hsu, Jimmy C. Tseng, and John Baptista. A question of trust. 
Commun. ACM, 48:87-91, September 2005. 

{BNP09] Laurent Bussard, Anna Nano, and Ulrich Pinsdorf. Delegation of access rights in multi- 
domain service compositions. Identity in the Information Society, 2(2):137—154, Decem- 
ber 2009. 

{BT10] Thomas de Buhr and Stefan Tweraser. My Time is PRIME Time. pages 69-91, 2010. 

{BZW06] Sruthi Bandhakavi, Charles C. Zhang, and Marianne Winslett. Super-sticky and de- 
classifiable release policies for flexible information dissemination control. In WPES ’06: 
Proceedings of the 5th ACM workshop on Privacy in electronic society, pages 51-58, New 
York, NY, USA, 2006. ACM. 

[CDK05] George Coulouris, Jean Dollimore, and Tim Kindberg. Distributed Systems. Concepts 
and Design. Addison Wesley, 4 edition, 2005. 

[CKO5] Luis Felipe Cabrera and Chris Kurt. Web Services Arcitecture and Its Specifications: 
Essentials for Understanding WS-*. Microsoft Press, 2005. 

{CLO1] Jan Camenisch and Anna Lysyanskaya. Efficient non-transferable anonymous multi- 
show credential system with optional anonymity revocation. In Birgit Pfitzmann, editor, 
Advances in Cryptology — EUROCRYPT 2001, volume 2045 of LNCS, pages 93-118. 
Springer Verlag, 2001. 

{[CLS11] Jan Camenisch, Ronald Leenes, and Dieter Sommer, editors. PRIME — Privacy and 
Identity Management for Europe, volume 6545 of Lecture Notes in Computer Science. 
Springer Berlin, 2011. 

{[FR11] Alea Fairchild and Piet Ribbers. Privacy-Enhancing Identity Management in Business, 
chapter 7, pages 107-129. Volume 6545 of Camenisch et al. [CLS11], 2011. 

{KROO] David P. Kormann and Aviel D. Rubin. Risks of the passport single signon protocol. 
Comput. Netw., 33:51-58, June 2000. 


453 


454 References Part V 


[KZR11] Tobias Kélsch, Jan Zibuschka, and Kai Rannenberg. Privacy and Identity Management 
Requirements: An Application Prototype Perspective, chapter 28, pages 723-744. Volume 
6545 of Camenisch et al. [CLS11], 2011. 

[Mob10] MobeyForum. Alternatives for banks to offer secure mobile payments,whitepaper of the 
mobeyforum, 2010. 

[MS09] Sebastian Meissner and Jan Schallabéck. Requirements for privacy-enhancing service- 
oriented architectures. Public project deliverable H6.3.1, PrimeLife Consortium, Novem- 
ber 2009. 

[PRI] PRIME. Privacy and Identity Management for Europe. https://www.PRIME-project.eu/. 

[Pri08a] PrimeLife WP6.2. Infrastructure for trusted content. In M.-M. H. Bergfeld, 
W. Hinz, and S. Spitz, editors, PrimeLife Deliverable D6.2.1. PrimeLife, http://www. 
{PrimeLife}.eu/results/documents, 2008. 

[Pri08b] PrimeLife WP6.3. Advancement and integration of concepts for secure and dynamic cre- 
ation of mobile services. In M.-M. H. Bergfeld and S. Spitz, editors, PrimeLife Deliver- 
able D6.3.1. PrimeLife, http://www. {PrimeLife}.eu/results/documents, 
2008. 

{Pri09] PrimeLife WP1.3. Requirements and concepts for identity management throughout 
life. In Katalin Storf, Marit Hansen, and Maren Raguse, editors, PrimeLife Heart- 
beat H1.3.5. PrimeLife, http://www. {PrimeLife}.eu/results/documents, 
November 2009. 

[Rtl10] C. Rtten. AusgespLht: Sicherheit von apps fr android und iphone. c’t - Magazin fuer 
Computertechnik, 20:86—91, 2010. 

[SSP08] Joachim Swoboda, Stephan Spitz, and Michael Pramateftakis. Kryptographie und IT- 
Sicherheit. Vieweg+Teubner, 2008. 

[vdBL10] Bibi van den Berg and Ronald Leenes. Audience segregation in social network sites. 
In Ahmed K. Elmagarmid and Divyakant Agrawal, editors, SocialCom/PASSAT, pages 
1111-1116. IEEE Computer Society, 2010. 

[W3C02] W3C. A P3P preference exchange language 1.0 (APPEL1.0), 2002. 

[W3C06] W3C. The platform for privacy preferences 1.1 (P3P1.1) specification, 2006. 

[Web10] V. Weber. Zwickmiihle: Der Streit um Blackberry-Sicherheit. c’t — Magazin fiir Com- 
putertechnik, 20:144-161, 2010. 

[ZRK11] Jan Zibuschka, Kai Rannenberg, and Tobias Kélsch. Location-Based Services, chap- 
ter 25, pages 665-681. Volume 6545 of Camenisch et al. [CLS11], 2011. 


Part VI 
Privacy Live 


456 


Introduction 


One of PrimeLife’s main objectives is to make “privacy live” by raising aware- 
ness concerning privacy and security risks as well as to present solutions to maintain 
one’s private sphere. The project contributes to this ambitious aim by transferring 
the mature results of the project into practice. There are various means to achieve 
this, and PrimeLife has applied a mixture of different approaches. For instance, 
PrimeLife has devoted itself to support education in the field of privacy and identity 
management: Together with the International Federation for Information Process- 
ing (IFIP), we conducted two summer schools for students and experts to exchange 
information on research questions and possible ways to tackle them. Also, this book 
represents one of the channels PrimeLife has chosen to convey its messages. Fur- 
ther, it turned out that PrimeLife became a major player in the cooperation with 
several other European and national projects working on privacy and identity man- 
agement. All projects involved could profit a lot by the synergies stemming from 
this cooperation. 

This chapter focuses on three important areas in which PrimeLife contributed to 
success in improving privacy and identity management: PrimeLife’s Open Source 
contributions, the projects contributions to the standardisation discussion, and fi- 
nally best practice solutions that emerged from PrimeLife’s work as well as the co- 
operation with so many interested people during the project’s lifetime, as explained 
in the following. 

There are a number of different reasons why people and companies are releasing 
Open Source software. These include the fundamental belief that software should by 
principle be freely available, a means to tap into a whole community of free devel- 
opers, because one would like to offer one’s work to other users, or as a publication 
much like research papers are published. For software that implements security and 
privacy mechanisms, an important reason is certainly that publishing the source of 
a software allows for the review of an implementation so that users can be ensured 
that there are no hidden trapdoors and that the implemented algorithms are really 
providing security. In addition, people are often not willing to spend money for se- 
curity (software) even if they see a potential need for it. SSH and SSL are examples 
of security protocols where implementations are freely available, are now widely 
adopted and, in fact, have enabled the growth of commerce over the Internet! 

The example of SSH and SSL exhibits another important reason for Open Source: 
to drive adoption of infrastructure components and standards. Indeed, here Open 
Source activities and standardisation go hand in hand. Both help to remove techni- 
cal barriers, open up new markets, and enable new economic models. At the same 
time, standards and free implementations of them increase opportunities for product 
differentiation and competition and services. 

PrimeLife has made available a number of Open Source components to share its 
results and to allow them to be used by other parties. Initially, PrimeLife wanted 
to interact with some of the existing Open Source activities to have them in- 
clude PrimeLife’s results in their projects. This quickly turned out to be unfeasible 
resource-wise. Instead, the PrimeLife project decided to share its results as stand- 


alone pieces and instead invest its resources in making these pieces as usable as 
possible. The project hopes that other projects will pick them up, experiment with 
the concepts, and hopefully incorporate them into real applications. Some of the 
code is experimental while other code is quite stable and (limited) support is offered 
by the authors. In Chapter 24 we give an overview of PrimeLife’s Open Source 
contributions. 

As a second way to make privacy live, PrimeLife has contributed to standardisa- 
tion organisations. The main focus here was on the relevant ISO working groups and 
the W3C-related bodies. Within ISO, PrimeLife concentrated on JTC 1/SC 27 work- 
ing group 5, in particular on the Framework for Identity Management and the Pri- 
vacy Reference Architecture. As W3C was a partner in PrimeLife, the project took 
advantage of the W3C working style and thus organised a number of workshops 
for interacting with the internet development community. Besides this, we have also 
engaged with other standardisation bodies such as some of the OASIS committees 
or the Internet Engineering Task Force (IETF). The standardisation work that the 
PrimeLife project has already done and considers that still needs to be done is de- 
scribed in Chapter 25. 

During PrimeLife’s work on privacy and identity management, it became ap- 
parent that the current state-of-the-art information technologies used for privacy- 
relevant data processing is certainly not satisfactory: In many cases, it neither 
matches the provisions of European data protection regulation nor does it address 
society’s and individuals’ needs for maintaining privacy throughout a full lifetime. 
Quite the contrary — society, with today’s IT systems, seems to be ill-prepared for 
the challenges ahead of us; instead of protecting the people’s privacy, we will have 
to face additional risks. PrimeLife was approached by several other projects and 
discussed its vision of privacy for life with many stakeholders. Part of its work con- 
sisted of elaborating requirements and recommendations for all stakeholder groups 
involved in the complex area of privacy and identity management. In Chapter 26, 
we present a selection of best practice solutions that address different stakeholders. 
They tie together a few ideas provided as Open Source components, some material 
offered to standardisation initiatives, and various recommendations from PrimeLife 
deliverables and discussions with other projects. All of these approaches belong to 
PrimeLife’s legacy, which can and will survive even after the end of the project. 


Chapter 24 
Open Source Contributions 


Jan Camenisch, Benjamin Kellermann, Stefan KoOpsell, Stefano Paraboschi, 
Franz-Stefan Preiss, Stefanie Potzsch, Dave Raggett, Pierangela Samarati, and 
Karel Wouters 


24.1 Introduction 


Privacy protection tools can be characterised by the number of parties that have to 
cooperate so that the tools work and achieve the desired effect [Pfi01]: Some privacy 
protection tools can be used stand-alone, without the need for the cooperation of 
other parties. Others require that the communication partners use the same tools. 
Some tools only function when being supported by an appropriate infrastructure 
that quite often is currently not in place. 

For all of these categories, one finds Open Source tools. Tools that one can just 
use include numerous browser extensions, such as CookieSafe, CSLite, NoScript, 
Ghostery, and also file encryption software such as TrueCrypt. Tools allowing the 
user to communicate anonymously also can be employed after being installed on the 
user’s computer, but need the cooperation of other users or servers on the Internet, 
such as AN.ON, TOR, I2P, and GnuNET. Next, among the tools that require the 
cooperation of the communication partners is encryption software such as GnuPG, 
OpenPGP, and OpenSSL. Finally, there are a couple of Open Source identity man- 
agement frameworks available (Higgins, OpenID) as well as access control com- 
ponents (OpenSAML and some XACML implementations) which offer a (limited) 
form of privacy if used properly. 

PrimeLife has produced a survey of the Open Source landscape, which is avail- 
able from http://www. primelife.eu/ (Deliverables D3.4.1 and D3.4.2). In 
this chapter, we describe the software tools that the PrimeLife project has made 
available on http: //www.primelife.eu/results/opensource/. For 
these, we also describe what related tools are available. 

The software tools that PrimeLife has produced range from research prototypes 
to almost product-quality components. A few such as Identity Mixer (Idemix) have 
their origins in the earlier PRIME project!, but most are implementations of con- 
cepts that were conceived in PrimeLife and can be seen as the project’s practical 
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research results. We present here a selection of PrimeLife’s Open Source contribu- 
tion. 


24.2 Social Software 


Social software such as web forums, social networks (e.g., Facebook, LinkedIn and 
wikis) have become very popular. In all these media, users share lots of personal 
information with the provider and with the other users of the media and often even 
with the entire internet community. Indeed, it has become hard if not impossible 
for users to know and control who has access to their shared data. PrimeLife has 
developed and published a number of different solutions to alleviate this. 

We have developed our own social network site Clique” that allows users to easily 
determine who is the audience of their posts, i.e., to set the access control policy the 
provider must enforce for their post. This assumes of course that the provider is 
trusted to enforce the policy and not to leak their data. If one does not have this 
trust, one could use PrimeLife’s Scramble! browser plug-in. This plug-in encrypts 
all submitted data before it gets posted to the provider in such a way that only 
one’s friends can read the data. Thus, the access control policy is enforced with 
encryption. 

Considering online forums, we have developed a privacy awareness tool (Per- 
sonal Data MOD) that informs users what data they are about to disclose to whom. 
Furthermore, we have developed a privacy-enhancing access control systems for 
forums. It is based on access control policies and anonymous credentials and em- 
powers the forum user to specify who can access her thread or forum post. We have 
implemented both of these prototypes for the phpBB forum engine. 

In the following we describe these tools in more detail. 


24.2.1 Clique — Privacy-Enhanced Social Network Platform 


Clique is a modification of the Elgg social networking platform (cf. Section 2.2.3). 
Clique provides users with a social network platform that enables them to keep 
control over their privacy. This includes, for example, fine grained access control 
and configuration of multiple faces (e.g., family, personal, professional) that can be 
used for interactions with other users. When posting a data item, e.g., name, birthday 
or profile photo on the site, the user can define for every single other user whether 
they should be able to see it or not. 
Clique achieves this with the following features: 


2http://clique.primelife.eu/ 
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e Collections — contacts are organised in collections, roughly corresponding with 
social circles. Users can form different instances by defining close friends, fam- 
ily, colleagues, former school-friends, etc. 

e Flexible access control to content — all content contains attribute certificate poli- 
cies based on moving collections and contacts in a plain and easy to use graphical 
user interface. 

e Visual audience indicators — all content is labelled by icons showing who has 
access to the information. 

e Fading relations — depending on the activity of one’s contacts, these users slowly 
disappear. At first this happens through visual indicators (coloured border around 
user icon), later by closing access to one’s data from the automatically de- 
friended contact. 


24.2.2 Scramble! — Audience Segregation by Encryption 


Existing social network providers, such as Facebook and MySpace, implement 
access control for users’ data. These mechanisms offer no protection against the 
providers themselves, as these have access to all users’ information. To address this, 
we have developed a model [BK W09] and implemented a Firefox extension named 
Scramble!? that allows not only for the definition of access control rules for audi- 
ence segregation, but also for the enforcement of users’ access control preferences 
by using encryption techniques (cf. Section 2.2.4). 

Scramble! implements a user-friendly access control enforcement, by making 
the decryption transparent to the user. In other words, the application will parse the 
user’s queried page and will only show the decrypted data that the user has access 
to. By doing a client side management of access rights, the user will be given full 
control on the enforcement of her privacy preferences. 

We use OpenPGP standard as the encryption mechanism, which provides us with 
a nice PKI infrastructure and key management model, allowing us also to broadcast 
encryption for multiple recipients. By using GnuPG we can also make anonymous 
recipient encryption, although it will be still vulnerable for active attacks. Apart 
from dumping large amount of encrypted data, it is also possible to dump to tinyurl 
modules, which will post the encrypted data into a third server. In this way, the 
problem of the large ciphertext size that currently grows linear with the number of 
users that are granted access is minimised. 

We have extended the plug-in to use a Java based embedded OpenPGP imple- 
mentation, using the Bouncy Castle Open Source library. In this way, we provide 
not only a GnuPG dependent implementation, but also a more user-friendly exten- 
sion version which has a built-in implementation of the OpenPGP standard. This 
provides users with a plug and play extension that offers the OpenPGP standard 
PKI infrastructure and full access control enforcement mechanisms to be used on 
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social network sites. Due to the general implementation mode, the extension can 
also be used on a broader range of Web 2.0 applications, such as e-mail and blogs. 
Scramble! has been released as an Open Source Firefox application, under the EPL 
license. 

Luo, Xie, and Hengartner provide a similar program called FaceCloak [LXH09]. 
Their strong point is that they do not post encryptions or tiny urls to the social 
network but fake information that looks as expected to the social networks. They 
encrypt the real content and store it on a third party server. However, in their case, 
they are using a symmetric key for this encryption which they then distribute, e.g., 
by e-mail to the intended recipient and hence users have to manage these keys by 
themselves. Here, our solution is much more scalable and indeed because of the 
use of public key encryption and PGP server, users do not have worry about the 
management of keys. 


24.2.3 Privacy-Awareness Support for Forum Users: Personal 
Data MOD 


When interacting with others on the Internet, users share a lot of personal data with a 
potentially large but “invisible” audience. An important issue is maintaining control 
over personal data and therefore, users need first to be aware to whom they are 
disclosing which data. 

There are tools available that give users feedback about their IP address, loca- 
tion, browser etc. and that can be integrated into websites (e.g., [Mof10]). However, 
showing users their IP address does not mean that they know what this means and 
who else may see this information. Therefore, we have developed a tool for the 
users’ privacy awareness in a comprehensive way, i.e., a tool that informs them 
which explicitly and implicitly disclosed data are visible to whom. For concrete- 
ness, we have chosen to do this for the popular phpBB forum software, which is 
available with a copyleft license and is developed and supported by an Open Source 
community*. Thus, the objective of the Personal Data MOD that we developed is 
to provide information about visibility of personal data to phpBB forum users and 
thereby supporting these users’ privacy awareness. 

Forum users get displayed Personal Data MOD on top of the forum (see Fig. 24.1, 
or Fig. 24.3 for an early version). On the left side, a user is reminded about personal 
data from her profile and its visibility. The user is also informed about additional 
information which is automatically transmitted to the forum provider when visiting 
the forum. On the right side, a user is notified about the visibility of her latest actions 
and she also learns that the forum does not “forget” old actions, but that everything 


*http://www.phpbb.com/ 
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PRIVACY AND VISIBILITY OF YOUR DATA 


THAT'S YOU SBEsit profile dats THAT'S WHAT YOL 
Your Jabber-ID is mehlstaub@jabber.org. “® Registered users Reading in "C 
Reading in “ 
THAT’ S WHERE YOU ARE [How the board knows this date. Posting reply on 
eply 0 
Location: Ill Dresden (Sachsen) 2 Board provider ert rae 
Internet provider: Technische Universitaet Dresden (TUDINF-LAN) ° Board provider SOUR 
Operating system | Brovser: s (8) ® Board provider Sending PM 
Edit profile data THAT'S WHAT YOU DID 
gistered users Reading in "DHL Versand - Transportschaden - Reklamation" (1 minute ago) > Board provider 
Reading in “Kulturflatrate, Fluch oder Segen?” (120 days ago) > Board provider 
rca Haba chat Posting reply on "Tatoo auf der Handflache 772” after reading in this topic (121 days ago)  <® Registered users 
Fe Posting reply on “Kulturflatrate, Fluch oder Segen? after reading in this topic” (121 days ago) <® Anyone 
ig Sending PM "Was machst du morgen Abend?” (121 days ago) “2 Communication partner 
Sending PM "Verbesserungsvorschlag” on Sat 24. Apr 2010, 13:57 <2 Communication partner 


Fig. 24.1: User interface of Personal Data MOD. 


is logged and can be looked up even after a longer time period. Personal Data MOD 
distinguishes four visibility classes for user’s personal data (see Fig. 24.2). 


WHAT ABOUT THE EYE ICONS? 


The eye icons symbolize the visibility of the presented data. The 
data can be visible for 


~* the board provider only, 

the board provider and direct communication partners 
only, 

<® the board provider and registered users only or 

<® the board provider, all registered users and all guests of 
the board. 


The tooltip of each icon has detailed information. Be avere that all 
data can become public in case of a security breach. 


Fig. 24.2: Eye icons in different shades of red representing four visibility classes for 
personal data. 


If a (privacy-aware) user visits the forum with Personal Data MOD via anonymis- 
ing services such as AN.ON or TOR, Personal Data MOD provides feedback that 
the anonymising service is working correctly by indicating a location, browser and 
operating system which should not match with the user’s actual system. 
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24.2.4 Privacy-Enhancing Selective Access Control for Forums 


To the best of our knowledge, no Open Source tools are available that allow forum 
users to specify the audiences of their posts. Therefore, we developed a phpBB 
extension that upgrades the access control features of the phpBB forum software so 
that users, instead of administrators, can define who should have access to their own 
contributions (e.g., thread, forum post) [PBP10, Pril0]. 

Since in a forum users do not necessarily know each other by name, the ac- 
cess control setting is done based on the other users’ properties (e.g., is over 18 or 
lives in Dresden). With the developed extension for the Open Source phpBB fo- 
rum software, the user as originator is able to specify access control policies for her 
contribution. 

The phpBB extension modifies and upgrades the original access control features 
of the forum, so that they work together with the access control components that 
were developed in the project PRIME. These components encompass: 


1. creating and editing access control policies, 
2. using anonymous credentials, and 
3. checking access control rights. 


In a forum with such extended access control features, each user is allowed to 
specify which properties someone has to possess in order to access the user’s con- 
tribution. It is not only possible to set access control policies for the whole forum 
or topics, but also on a more fine grained level for threads and even single posts (cf. 
Fig. 24.3). This means, the user is able to define a particular audience for each single 
contribution and, thus, privacy-enhancing identity management is realised by audi- 
ence segregation based on the properties of the audience. Technically, the process 
of creating a new resource (e.g., a thread) includes the originator of that resource 
receiving the corresponding credential (e.g., cred: Owner-Thread-ID). Further, a set 
of default access control policies is created, which ensure that only administrators 
who show an administrator credential or moderators who possess a moderator cre- 
dential gain the required access needed to fulfil their roles. The owner of a resource 
possessing the owner credential always has access to that resource and can modify 
the access control policies to, e.g., also allow users who live in Dresden and who 
can show a LivesInDresden credential read and write access to the resource. 


24.3 Dudle — Privacy-enhanced Web 2.0 Event Scheduling 


A number of websites have made scheduling any kind of event or running a poll 
much simpler. Every one can just set up a poll or suggested times and dates for 
a meeting, send the url to the people who should participate, and then, once all 
people have entered their preferences, schedule the meeting or read the results of 
the poll. Current implementations for event scheduling are available as stand-alone 
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ehewB ==... 


Your Internet address is 141.76.46.106. Posts in this thread are visible for all Internet users. 


POSTREPLY& 4 Search this thread. |Search 2 posts * Page tof! 


Fit For Summer feo) (x) VY WY ‘Sovore) Bob 
Oby Bob » Wed Aug 19, 2009 9:57 am 


Hey guys, 
let's talk about our sports activities here. 


Bob 


by->- 


je access denied 


Display posts from previous: [Al posts >] Sort by [Post time +] [Ascending ~] [60] 


Fig. 24.3: phpBB forum with extension for privacy-enhanced access control (note: 
access to second post is denied). 


applications [Naf10, FS10, Prol0, Soll0, Pen08] and extensions to other Web 2.0 
applications (e.g., in wikis or Groupware) [Tsa10, ope 10, egr10]. 

As in most Web 2.0 applications, privacy is only a secondary goal. The fact that 
everybody may create polls, cast votes to existing polls, see results of other polls, 
and even revise casted votes of running polls make the application easy to use but 
also eliminates security and privacy. When participating in a poll, one has to share 
personal information with the server, the other participants, and even with the whole 
world. 

Some of the applications recognised that users demand some privacy. Therefore, 
the possibility to create “hidden polls”, in which only the administrator can see all 
votes, is offered in some solutions. To enhance security, it is possible to allow voting 
and modification of votes, with login/password only. However, no application tries 
to overcome the trust in the application provider or the poll initiator. 

Applications that especially target privacy and security requirements are special 
e-voting applications [Adil0, Adi08, CCM08]. However, these are not fully web- 
based and therefore cannot be considered as an easy to use Web 2.0 application. 

PrimeLife set out to demonstrate that one can indeed schedule a meeting or 
run a poll in a more privacy preserving manner. The result of this effort is the 
Dudle [KBO9, Kell1] Web2.0 application, which is available from PrimeLife’s 
website® (Fig. 24.4). 


®http://www.primelife.eu/ 
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Fig. 24.4: Screenshot of Dudle. The single votes are encrypted in the browser using 
JavaScript. 


It allows users to create polls easily and to set a number of privacy preferences. 
In particular, participants can submit their entries in encrypted form and the match- 
ing is done in such away that the server as well as all the participants only learn 
the result. This is comparable to the e-voting applications mentioned, but, due to 
dropping the requirements of external tool installation, it is possible to arrange and 
perform these multilateral secure polls within a web browser, which makes it easy 
to use, even for average users. 


24.4 The Privacy Dashboard 


The Privacy Dashboard’, developed within the PrimeLife project, is an extension for 
the Firefox browser that enables you to see some of the practices that websites are 
using, e.g., whether they include third party content, perhaps with lasting cookies 
that can track you across the Web, or are using a variety of other techniques. 

The Dashboard collects information about the current website as pages load. This 
is presented by an icon that appears on the browser’s navigation toolbar next to the 
location field. The icon displays one of three aspects: a happy face, a thoughtful face 
and an indignant face. This is based upon rules of thumb that classify the website. 
The indignant face is shown if the site uses external third party HTTP cookies or 
external third party Flash cookies. The thoughtful face appears if the site has lasting 
HTTP cookies, Flash cookies or external third party content, and lacks a link to a 


Thttp://www.primelife.eu/results/opensource/76-dashboard 
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Privacy Dashboard (é) 


(Data Track )(Location (Current Website (Share Findings )(About ) 
Information about the current website 


Review and adjust privacy options for the current website, including, cookies, P3P and more. You 
can find out more about this website using the queries on the 'Data Track' tab above. 


@ www.lovefilm.com 


This website has: Your preferences for this website: 
* 14 session cookies (¥) Never block content from this site 
* 6 lasting ——— (¥) Block external 3rd parties 
* a flash cookie (¥) Block external 3rd party cookies 


¢ 7 internal third party sites 


(¥) Block all lasti ki 
¢ 11 external third party sites ¥. : Shenae 


* an external third party session cookie Yel flash cookies - 

24 external third party lasting cookies (¥ Disable web page scripting 
(¥) Disable access to your geolocation 
(¥) Disable HTMLS pings 


(¥) Don't send HTTP referrer header 
(¥) Disable web page access to DOM storage 


( Simple View 


(¥) Use these by default for all websites 


You can use the following buttons to check the current website in various ways 


{Check site with Norton SafeWeb...} {Check site with Free Trust Seal...) {Check site with TRUSTe... ) 


Fig. 24.5: Screenshot of the Privacy Dashboard. 


machine readable (P3P) privacy policy. Otherwise the happy face appears. These 
rules of thumb are to some extent arbitrary, and simply intended to draw the user’s 
attention to the data collected. 

The first time you visit a website, the Privacy Dashboard displays a privacy alert 
in a notification bar at the top of the page. This is the same bar used by Firefox 
to ask users for permission to save their user ID and password for the site. The 
notification bar does not appear if the site is classified with the happy face. You are 
invited to choose between ‘accept always’ (i.e., don’t bother me again for this site), 
‘protect me’, or ‘tell me more’. The “protect me’ button ensures that for subsequent 
loads, scripting is disabled along with cookies and third party content. The ‘tell me 
more’ button displays the Privacy Dashboard dialogue window (see Fig. 24.5). The 
dialogue can also be displayed at any time by clicking on the Privacy Dashboard 
icon on the navigation toolbar. 
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The Privacy Dashboard dialogue has five tabs labelled “Data Track”, “Location”, 
“Current Website”, “Share Findings” and “About”. By default it opens with the cur- 
rent website tab. This shows information about the current website, your preferences 
for this site, and some buttons for checking the website with third party tools that, 
if clicked, open up in a new browser tab. The buttons cover Norton SafeWeb, Free 
Trust Seal, and TRUSTe. 

The information shown for the site covers HTTP cookies, Flash cookies (Flash 
Local Shared Objects), third party content, DOM storage, geolocation, HTMLS5 
pings, invisible images and suspicious URLs indicating the possible use of web- 
bugs (tracking devices). Cookies are classified according to whether they are re- 
tained beyond the current browser session, and whether they are used for this site, 
an internal third party site (one with a common base domain) or are for external 
third party sites. 

The Data Track tab in the Privacy Dashboard dialogue (cf. Section 13.4) allows 
you to query the database of information the extension collects on each site you 
visit. You select from a drop-down list of queries together with a text box for typing 
in the domain name for a website, or a datum name or value. The queries include: 


Which data has been sent to a given website? 

Which sites has a given datum value been sent to? 
Which sites has a given datum name been sent to? 
Which sites use long lasting cookies? 

Which sites use session cookies? 

Which sites use Flash cookies? 

Which sites use DOM storage? 

Which sites are third parties? 

Which internal third parties are used by a given site? 
What cookies are used by a given site? 

Which sites use invisible images? 

Which sites use HTMLS pings? 

Which sites offer P3P policies? 

Which sites have you given access to your geographic location? 


The Dashboard allows you to set personal privacy preferences on a site by site 
basis. The preferences are available on two levels: simple and advanced, offering 
a choice between three predefined levels of privacy (carefree, thoughtful and para- 
noid), or detailed control over a range of settings: 


Never block content from this site. 
Block external third parties. 

Block external third party cookies. 
Block all lasting cookies. 

Clear Flash cookies. 

Disable web page scripting. 
Disable access to your geolocation. 
Disable HTML5S pings. 
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e Don’t send HTTP referrer header. 
e Disable access to DOM storage. 


The Firefox extension is able to implement these by directly intercepting and 
blocking HTTP requests, or by setting browser options. 

The latter is imperfect since the option to disable scripting applies to all new 
pages and not just to the current tab. The extension does its best to limit changes to 
browser-wide options to the time the page is being loaded, but if several pages are 
being loaded concurrently on different tabs, then problems may well arise. Hope- 
fully, this problem will be resolved by browser vendors offering more fine grained 
options that can be set on a per tab or per website basis. 

The Adobe Flash plug-in is ubiquitous and installed on pretty much all web 
browsers. It runs in isolation from the rest of the web browser and as such makes it 
impractical for the Privacy Dashboard to intercept HTTP requests and to set Flash 
specific options. The extension is, however, able to access the local file system to 
examine and when requested to delete the files used for Flash Local Shared Objects. 

The Privacy Dashboard also improves upon the browser’s built-in support, mak- 
ing it easier to track and revoke which sites you have told Firefox to provide your 
geolocation to. If you are on a WiFi connection, you can check to see just where 
Google thinks you are based upon your WiFi neighbourhood. 

The data collected by the Privacy Dashboard as you browse gives a view about 
a small part of the Web. By pooling data from many users, it will be possible to 
build up a much more detailed picture of how sites are tracking users. To this end, 
the Privacy Dashboard allows you to choose to share your findings with others. 
The information uploaded is limited to data about the site and its relationship to 
third party sites, and avoids any information that could be used to identify you. You 
can determine the server the uploads are made to, along with the frequency of the 
updates. 

To encourage users to share their data, the Privacy Dashboard invites users to 
opt in when running the Dashboard for the first time. Thereafter, users can review 
and change their sharing preferences on the “Share Findings” tab on the Privacy 
Dashboard user interface. Servers that pool the data should avoid logging the client’s 
IP address, time of upload, and the set of sites covered. This should be made clear 
in the server’s privacy policy. If you are at all concerned. you can of course set your 
sharing preferences to use an anonymising proxy for your uploads. 

There are a number of other Firefox extensions related to privacy, e.g., Adblock 
Plus, NoScript and BetterPrivacy. These seek to block out web page ads, to disable 
scripting or to offer greater control over cookies and other tracking devices. The 
Privacy Dashboard also does that and adds the means for users to gain greater visi- 
bility into how sites are tracking them, and the means to query this data, as well as 
to contribute to a broader understanding of tracking across the Web. 
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24.5 Privacy in Databases 


Today, databases are often out-sourced to a database provider and sometimes also 
distributed over several database providers. PrimeLife has considered such scenar- 
ios and studied how sensitive personal data can be protected in such databases. In 
particular, we have designed and implemented two tools: Pri-Views and OverEn- 


crypt. 


24.5.1 Pri-Views — Protecting Sensitive Values by Fragmentation 


When considering typical scenarios where databases are outsourced to a separate 
provider, one finds two important requirements: 1) the need to integrate the ser- 
vices of database providers that do not belong to the same organisation and 2) the 
presence of a variety of platforms, with an increase in the number and availability 
of devices that have access to a network connection, together with the presence of 
powerful servers offering significant computational and storage resources. The first 
aspect forces the requirement to specify security functions limiting access to the 
information stored in the databases. The second aspect instead forces an environ- 
ment where the data and computational tasks are carefully balanced between the 
lightweight device and a powerful remote server. The two aspects are strictly re- 
lated, since the servers are typically owned by service providers offering levels of 
cost, availability, reliability, and flexibility difficult to obtain from in-house opera- 
tions. In the literature, this problem has been addressed by combining fragmentation 
and encryption, thus splitting sensitive information among two or more servers and 
encrypting information whenever necessary [CDCdVF* 10]. 

Our contribution (Pri- Views) is a different solution to the problem. Pri- Views de- 
parts from encryption, thus freeing both the owner and the clients from the burden 
of key management. In exchange, we assume that the owner, while outsourcing the 
major portion of the data to one or more external servers, is willing to locally store 
a limited amount of data. The owner-side storage, being under the owner control, is 
assumed to be maintained in a trusted environment. The main observation behind 
our approach is that often it is the association between data that is sensitive, in con- 
trast to the individual data items themselves. As with recent solutions, we therefore 
exploit data fragmentation to break sensitive associations; but, in contrast to them, 
we assume the use of fragmentation only. Basically, the owner maintains a small 
portion of the data, just enough to protect sensitive values or their associations. 

Pri- Views offers a prototype that is mainly used to test the greedy algorithm de- 
signed to solve the problem of fragmenting data to protect sensitive associations, 
while limiting the data owner workload. Pri- Views takes as input a relation table 
and produces two views (vertical fragments) over it: one to be stored at the exter- 
nal service provider, and one to be directly managed by the data owner. The tool is 
composed of two applications: the first implements the proposed greedy algorithm 
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(developed in C++), while the second realises its Graphical User Interface (devel- 
oped in Java). 

Through the GUI, it is possible to define a data collection, characterised by a set 
of attributes, and a set of constraints on the joint visibility of the data in the col- 
lection. The prototype produces a fragmentation that satisfies the constraints while 
minimising the storage and/or computational workload for the data owner. 

There are several Open Source tools that support the design of relational database 
schemas. For instance, SQL Workbench and SQL Power Architect permit the graph- 
ical design of relational database schemas. Most Open Source DBMSs are integrated 
with design tools, like pgAdmin for Postgres. The specific fragmentation design 
problem supported by Pri-Views is not available in these systems. 


24.5.2 Over-Encrypt 


Over-Encrypt is a client-server web application that provides data sharing capabili- 
ties in an outsourcing scenario where the storage service provider is trusted neither 
for data confidentiality nor for enforcing access control functionalities. The strong 
points of our solution lie in the scalability and efficiency of the data outsourcing 
mechanism, and in the decentralised management of access control policies and 
their evolution. Our approach supports the user in the specification of access re- 
strictions to resources she wishes to share, via an external service provider, with a 
desired group of other users. Our proposal guarantees that only users in the specified 
group will be able to access the resources, which remain confidential to all the other 
parties, including the service itself. Scalability, efficiency and evolution of access 
control policies are the strong points of the proposed solution. 

For our prototype, we chose Java (JDK 1.6.0) as software platform to develop 
the application server cooperating with an Apache Tomcat web server and a Post- 
greSQL database server. At the client side, we developed a Mozilla Firefox exten- 
sion, with a binding to binary libraries written in C++, for the realisation of the 
cryptographic primitives. 

The target audience for our prototype are developers who want to use state-of- 
the-art mechanisms to enforce access control polices over resources in a distributed 
environment. 

A family of tools that offers a service with some similarity to Over-Encrypt is 
represented by the tools for on-the-fly-encryption (OTFE). There are more than a 
dozen Open Source tools that support this service. Among them, TrueCrypt is the 
most well known. These tools support the encryption of the content of the file sys- 
tem, offering to the user transparent access to the file system, which is stored en- 
crypted on the disk: if the passphrase is not provided at the start of the system, the 
file system content is not accessible. There are two crucial aspects that distinguish 
Over-Encrypt from these systems. (1) OTFE tools do not typically support access 
control enforcement and only assume a single owner having complete access to the 
data; a single key is sufficient for each protected file system. (2) OTFE tools assume 
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that the storage device is the local disk, whereas Over-Encrypt assumes the use of a 
remote storage provider. 


24.6 Anonymous Credentials 


We have argued in Chapter 5 of this book that private credentials (or privacy- 
enhancing PKIs) are a fundamental building block to achieving privacy in authenti- 
cation. Essentially, users are issued attribute certificates from different organisations 
and can then later selectively reveal these attributes to a relying party without reveal- 
ing any of the other attributes. 

Further, in Section 18 we described our vision on privacy-preserving access 
control systems that are based on such private credentials (cf. Section 18.1) and 
how they can be implemented on the basis of standardised technologies (cf. Sec- 
tion 18.4). 

The following two subsections elaborate on our Open Source implementations 
of private credential systems on the one hand and components for building privacy- 
preserving credential-based access control systems on the other. 


24.6.1 Identity Mixer Crypto Library 


Identity mixer (Idemix) is an implementation in Java of a private credential system 
based on Camenisch-Lysyanskaya scheme [CLO1]. More precisely, it is a library 
that allows one to issue credentials and prove ownership of credentials. Thereby, the 
library also supports other cryptographic objects such as pseudonyms, encryptions 
of attributes and commitments to attributes. 

To orchestrate all these cryptographic objects, we have developed a specification 
language for the issuing protocol and also for the credential presentation protocols. 
Fig. 24.6 provides an example of this specification language for the credential spec- 
ification protocol (also called proof protocol). In this example, a user proves posses- 
sion of three different credentials, each of which has the same value for the attribute 
LastName. This value is further encrypted under a public key PublicKeyl. 

The library and documentation are available from the PrimeLife webpage. We 
refer to that documentation for a detailed description of the architecture, all specifi- 
cation languages, and details for the underlying cryptographic protocols. 
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Declaration{ idl:unrevealed:string; id2:unrevealed:string; 
id3:unrevealed:int; id4:unrevealed:enum; 
id5:revealed:string; id6é:unrevealed:enum } 

ProvenStatements{ 

Credentials { 
randNamel:http://www.ch.ch/passport/chPassport10.xml 
{ FirstName:idl, LastName:id2, CivilStatus:id4 } 
randName2 :http://www.ibm.com/employee/employeeCred.xml = 
{ LastName:id2, Position:id5, YearsOfEmployment:id3 } 
randName3:http://www.ch.ch/health/healthCredl0.xml = 
{ FirstName:idl, LastName:id2, Diet:id6é } } 


Enums { 
randNamel:CivilStatus = or[Marriage, Widowed] 
randName3:Diet = or[Diabetes, Lactose-Intolerance] } 
Commitments { randCommNamel = {idl,id2} } 
Representations{ randRepName = {id5,id2; basel,base2} } 
Pseudonyms { randNymName; http://www.ibm.com/employee/ } 
VerifiableEncryptions{ {PublicKeyl, Label, id2} } 
Message { randMsgName = "Data to be used only for ..." } 


Fig. 24.6: Example proof specification using a Swiss passport, an IBM employee 
credential, and a Swiss health credential. 


24.6.2 Components for a Privacy-Preserving Access Control 
System 


The PrimeLife Policy Engine (cf. Section 20) is an implementation of the policy 
concepts developed throughout the project. Those concepts comprise access con- 
trol as well as usage control (also called data-handling) aspects. Although the en- 
gine itself is not available as an Open Source implementation, the components that 
concern the credential-based access control aspects are provided as Open Source. 
Those components constitute the major building blocks necessary for developing 
a stand-alone privacy-preserving credential-based access control system. To under- 
stand which components are provided in particular, consider Fig. 24.7 that illustrates 
how a privacy-preserving access control transaction takes place as well as the fol- 
lowing description of the Figure. 

A transaction involves three kinds of entities: users, servers (also called service 
providers), and issuers. Initially, a user contacts a server to request access to a re- 
source she is interested in (1). Having received the request, the server responds with 
the credential-based access control policy applicable for the resource (2). The appli- 
cable policy may be a composition (1a) of multiple policies the server holds. Upon 
receiving the policy, the user’s system evaluates which claims she can derive from 
her available credentials that fulfil the given policy (2a). The favoured claim is then 
chosen by the user interactively or automatically (2b). If the user wants to proceed, 
evidence for the chosen claim is generated by a plug-in specific to the respective 
credential technology (2c) and sent, together with the claim and the attributes to 
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Fig. 24.7: Decision rendering in our system model. 
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reveal, to the server (3). Finally, the server verifies whether the policy is implied 
by the claim (3a) and if the evidence supports the validity of the claim (3b). If so, 
access to the resource is granted (4). 

The Open Source components that we provide are capable of performing steps 
(2a), (2c), (3a) and (3b). For performing step (2b), the Send Data dialogue that is 
described in Section 14 could be employed. For implementing an entire privacy- 
preserving access control transaction as described above, the provided open-source 
components have to be integrated with an existing access control system and an 
appropriate messaging standard. In Section 18.4, we elaborate on how this can be 
done for XACML and SAML and we plan to provide an implementation of those 
concepts at the end of the project, i.e., an implementation capable of performing an 
entire privacy-preserving access control transaction. 


24.7 Conclusion 


The sections above illustrate that within and outside the PrimeLife project, a vast 
amount of initiatives in Open Source related to privacy and identity management are 
being developed. With time, some of these will grow and become de facto standards 
and tools, while some will perish. This selection process will, contrary to what one 
might expect, not necessarily result in survival of the fittest. One of the problems 
in an Open Source setting is the inability to develop sustainable packages: Open 
Source products are developed, released and then left as is without active maintain- 
ers. It may be naive to assume that Open Source products prove themselves in such a 
case, and that the best solutions will stand out and survive because of their technical 
superiority or user-friendliness. To stay worthwhile, an Open Source project should 
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grow: attract new developers, new community members, and new money. It should 
scale in functionality, which also implies more complexity. Therefore, an active ad- 
vertisement and uptake by large players is needed to bring even a superior product to 
the large masses. One example of this is the OpenOffice productivity suite, that de- 
veloped a Strategic Marketing Plan within the OpenOffice.org Marketing Project’. 

General recommendations regarding Open Source can be found in the 2020 
FLOSS Roadmap.’ Some of these recommendations can be made more specific 
for the privacy and identity management field. Regarding openness and freedom 
in ICT infrastructures, the recommendations are that network neutrality should be 
protected with a legal framework, and by developing decentralised, user-controlled, 
free software-based web services for all essential social or collaborative applica- 
tions. This is necessary to keep users’ data under users’ control, but as well as to 
ensure that network anonymity technology is not blocked. Furthermore, the FLOSS 
Roadmap mentions that government entities should actively seek FLOSS-based so- 
lutions as much as possible. Particularly in the privacy and identity management 
area, this is a good strategy: as government entities typically deal with the identities 
and sensitive data of their citizens, they can be the first to set the example and in- 
troduce Open Source solutions to society. Open Source software that could help to 
protect the privacy of citizens should therefore be supported, used and integrated by 
government institutions. 

Apart from these general considerations, specific technologies in privacy and 
identity management remain insufficiently supported in Open Source, if supported 
at all. 

First of all, one essential part that is underdeveloped at the moment is a wide- 
spread framework into which Open Source privacy and identity management mod- 
ules can be plugged. A zoo of small solutions, each for one specific application or 
use, each with its own settings and installation defaults, can only be maintained by 
ICT-savvy people. Typical tasks of such a framework are the storage of the user’s 
privacy preferences (privacy policy), and the validation of requests against those 
preferences. The work within PrimeLife in the area of policy languages certainly 
establishes a good starting point for the development of such a framework. Another 
component of such a framework could be the Identity Metasystem [CJ06], taking 
care of the identity management of users, which should — in theory — be agnostic of 
specific identification technology. A considerable amount of technology that imple- 
ments the Identity Metasystem is Open Source already, including identity providers 
as well as relying party components. The Higgins framework and Microsoft code 
name ‘Geneva’ are efforts in this area. The development of U-Prove!” by Microsoft 
and the IBM’s Identity Mixer (Idemix) are available from PrimeLife’s website. Re- 
cent work, seeded by Southworks, on bridging OpenID and WS-Federation,!! indi- 
cates that more technologies are integrated into the idea of the Identity Metasystem. 


8 http://marketing.openoffice.org/ 


° FLOSS — Free Libre and Open Source Software; http: //www.2020flossroadmap.org/ 


M https://connect .microsoft.com/content/content.aspx?contentid= 


12505\ésiteid=642 
Ml https://github.com/southworks/protocol-bridge-claims-provider 
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Despite this promising progress in the Identity Management part of such frame- 
works, advanced privacy technology such as the PrimeLife policy language and its 
enforcement engine are yet to be integrated. 

Secondly, awareness tools and (related) data analysis tools for users are not thriv- 
ing in Open Source. The main efforts are seen in browsers implementing a primitive 
form of data minimisation (restrictions on cookies, ’anonymous’ browsing, etc.), 
and some support for privacy policy languages. Also, classical virus scanners, fire- 
walls, disk cleaners and anti-spyware will protect against unauthorised collection 
of data, but fail to assist the user with advice regarding the data that she wants to 
release voluntarily. The Privacy Dashboard developed within PrimeLife is one ef- 
fort in that direction, aimed at providing the user with an overview of the data she 
released. 

Furthermore, most of the tools described above work well in a PC setting, but 
this is a platform that is losing ground at a rapid pace. A majority of the people born 
today will not use a classic PC to access the Internet in their daily routine. Devices 
such as smartphones, game consoles, TV sets, tablet PCs and even ordinary house- 
hold appliances will try to provide seamless network access everywhere and at any 
time. “According to Gartner’s PC installed base forecast, the total number of PCs in 
use will reach 1.78 billion units in 2013. By 2013, the combined installed base of 
smartphones and browser-equipped enhanced phones will exceed 1.82 billion units 
and will be greater than the installed base for PCs thereafter.”!* This opens up a 
range of additional concerns: 


e Even if these devices will have the same computational power and software in- 
stalled on them in the near future, they are present and being used at this moment. 
In the meantime, very sensitive information about the users can be collected. In 
any case, porting Open Source solutions to these devices — if at all possible — will 
take some time and effort. 

e An additional problem is that some devices come with closed operating systems, 
for which the integration of Open Source poses a threat, especially when the 
related Open Source licenses behave in a viral way. 

e New software challenges will arise when devices do not support a classical user 
interface, or feature peculiar hardware components. Again, additional efforts will 
be needed to adjust (and not only port) Open Source solutions to these devices. 
Moreover, mobile phones initiate fundamentally different uses than PCs due to 
their portability and their enhanced sensors. One obvious example is location, de- 
termined by GPS, WiFi or GSM cell. This triggers additional privacy challenges, 
even for non-users. !? 


'2 Gartner Highlights Key Predictions for IT Organizations and Users in 2010 and Beyond; ht tp: 
//www.gartner.com/it/page. jsp?id=1278413 

3° Four Billion Little  Brothers?: Privacy, Mobile Phones, and — Ubiqui- 
tous Data Collection; http://cacm.acm.org/magazines/2009/11/ 
48446-four-billion-little-brothers/fulltext 
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In any case, we see that currently, these new devices are in their infancy with respect 
to privacy technology, and that psychological barriers that existed in PC users are 
easily discarded. 

Finally, at the network layer, new technology will have to be developed to enable 
anonymity. Existing Open Source implementations such as TOR and the related 
JonDo, based on the AN.ON project, remain slow for common use, and even for 
ordinary operations that require limited bandwidth. This is a huge problem, con- 
sidering the continuous move towards faster networks necessary for online gaming, 
Video-on-Demand, IPTV, etc. Related to this challenge is the move towards cloud 
computing. Obviously, this can generate new privacy problems, especially in the 
case of SaaS (software as a service), which is in fact the recentralisation of data 
and processes, from end-user devices to entities, possibly operated by a single con- 
troller. On the other hand, the cloud can also be used in favour of privacy. In essence, 
TOR is a kind of cloud computing, and so is the Diaspora project,!* in which a dis- 
tributed, privacy-aware, personally controlled Open Source social network is being 
developed. 

Summarising, additional funding and promotion of Open Source products for 
privacy and identity management should be aimed at: 


e identifying and marketing specific, technically superior Open Source products 
actively, 

e establishing an interoperable framework in which Open Source modules for pri- 

vacy and identity management can be developed and plugged, 

funding the creation of advanced user awareness tools in Open Source, 

Open Source for end-user devices other than the classic PC, 

new anonymity networking software for broadband traffic, 

Open Source initiatives that leverage the possibilities of cloud computing for 

privacy technology. 


'4 Diaspora: The privacy aware, personally controlled, do-it-all, Open Source social network. 
http://www. joindiaspora.com/ 


Chapter 25 
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25.1 Introduction 


Standardisation has many goals and facets: Standards are used for consumer protec- 
tion to achieve a minimum quality of certain products and services. Standards lead 
to lower cost because of a unified higher volume market. Standards also support 
interoperability that is vitally needed in ICT. 

The ICT landscape is characterised by an extensive division of labour between 
specialists. The device driver programmers rely on the information that the device 
manufacturers give them. The operating system developers rely on information and 
interfaces that are provided by the device drivers and by the CPU instruction set. 
Application developers rely on the interfaces from the operating system and web 
developers rely on the interfaces the Web provides. This means that ICT has a much 
larger need for agreed information that leads to interoperability. In short, ICT needs 
many more standards than the rest of the industry. 

But the function of standards in ICT goes far beyond pure interoperability. A new 
set of interfaces is sometimes the way to open an entire new world thus creating new 
markets. For instance, it took new standards to bring the Web to mobile devices thus 
creating a huge new market for applications and commerce. 

Quite often, the idea for such new markets comes out of research. However, re- 
searchers are usually not taking the pains to actually create the market. Mostly they 
are satisfied to show that it theoretically should work and perhaps provide a demon- 
strator to showcase what it could look like. The European Commission realised this 
gap and consequently has recently put a lot of emphasis on the relation between 
research and standardisation.! 

Traditional industry standardisation is rather directed on achieving agreement 
among several vendors whose products have converged sufficiently to formalise the 
common understanding of how things should be done. Further, standardisation is 
used by public authorities to achieve goals of consumer protection. 


' For further information on research and standardisation see http: //copras.org/. 


J. Camenisch et al. (eds.), Privacy and Identity Management for Life, 479 
DOI 10.1007/978-3-642-20317-6_25, © Springer-Verlag Berlin Heidelberg 2011 
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In both fields, PrimeLife has developed activities and has generated impact in the 
emerging markets concerning privacy and identity management. PrimeLife’s part 
in ISO? standardisation has focused on high level framework and platform spec- 
ifications that contain requirements on privacy-respecting software design. The 
W3C? work has concentrated around enduring the dialogue between web devel- 
opers, browser makers and researchers, understanding privacy issues of the Web, 
presenting possible solutions and searching for a possible consensus with the web 
community. Finally, PrimeLife used the opportunity to offer drafts to the Network 
Working Group of the IETF*. 

In the following, this section gives an overview of PrimeLife’s approach to giving 
input to the ISO/IEC standardisation (cf. Section 25.2), the project’s collaboration 
with the W3C (cf. Section 25.3) and some results in the cooperation with the IETF 
(cf. Section 25.4). 


25.2 Standardisation in ISO/IEC JTC 1/SC 27/WG 5 


In ISO, the joint technical committee ISO/IEC JTC 1/SC 27 is in charge of stan- 
dardising security standards for information systems. Among other things, they are 
behind the 27000 series on information security management systems. Within SC 27 
the working group 5 (WG 5) is responsible for standards within the identity man- 
agement and privacy area. 

Early on, PrimeLife established a cooperation with WG 5 in the form of a liai- 
son agreement with the group. The reason for the liaison is that WG 5 is working 
on a number of standards that have commonalities with the aims and the scope of 
the PrimeLife project and we wanted to be able to influence these standards and to 
contribute with our knowledge and findings in the standardisation process. The con- 
tributions of PrimeLife have been very well accepted by WG 5 and we believe that 
we have had mutual benefit from the cooperation. Even though the whole spectrum 
of the standards within WG 5S is of interest, there are three projects that lie close to 
the work going on in PrimeLife and we have therefore decided to concentrate our 
contributions to these standards. 

The projects concerned are the 24760 “A Framework for Identity Management’ 
standard, the 29100 “Privacy Framework” standard and the 29101 “Privacy Refer- 
ence Architecture” standard. Most of the contributions have been in the form of dis- 
cussions on work group meetings and comments on standard drafts; however, there 
are some areas where PrimeLife has made very significant impact. The remainder 
of the subsection will discuss specifically PrimeLife’s input to the Framework for 
Identity Management and the Privacy Reference Architecture. 


? ISO stands for International Organization for Standardization. In ICT its work is often aligned 
with the standardisation within IEC (International Electrotechnical Commission), cf. http: // 
www.iso.org/. 

3 World Wide Web Consortium, cf. http: //w3.org/. 

4 Internet Engineering Task Force, cf. http: //www.ietf.org/. 
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25.2.1 ISO 24760 — Framework for Identity Management 


ISO 24760 aims at describing a framework for identity management and defining 
its components. The standard presents terminology, concepts, identity life cycle and 
best practices within the identity management area. It started out as a monolithic 
standard, but after suggestions from PrimeLife and other contributors, it was divided 
into three parts. The biggest issue within the standard has been around terminology 
and the interpretations of the different terms. There were also some discussions on 
the format of the descriptions of the different terms. PrimeLife suggested a total 
make-over of the structure and format of the terminology and as a result of this, one 
employee at one of our partners became the co-editor of the standard. 

Identity is an important and ambiguous concept in identity management. The 
understanding of the term (and the implications of that understanding) ranges from 
a collection of attributes associated with an individual to a collection of attributes 
making an individual unique. In the realm of natural or legal persons, it is easy to 
argue that an identity is a collection of attributes associated with an individual. 

However, if the identity concept is pushed into the realm of objects, the under- 
standing or the limits of the concept becomes problematic. Potentially, one could 
argue that one unit of data would be an identity or that everything is an identity if an 
identity is defined as a collection of attributes associated with an object. A conse- 
quence of this is then that every computer system is an identity management system, 
which is not in line with the understanding of the experts in the field and could also 
make the concept of identity essentially useless since nothing exists that is not an 
identity. 

On the other hand, requiring that an identity always uniquely identifies the entity 
blurs the difference between identity and identifiers. More or less, this understand- 
ing makes it pointless to allow a user to have multiple identities in the system and 
potentially creates large privacy problems. As a consequence, one of the biggest is- 
sues regarding the terminology has been the concept of identity including terms like 
identifier and partial identity. The problem with partial identity is that the concept is 
rather new and not used that much outside of research circles. 

Some of the attending experts thought that it could be hard to push it into an 
industrial setting even if they do agree with and understand the concept. In the ter- 
minology discussions, PrimeLife has provided its view of the concepts. PrimeLife 
also contributed in making the document consistent and in advocating the users’ 
view and tried to gear the standard into a more user-centric model by providing the 
experience gained and discussions held during the project. 
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25.2.2 Introducing Privacy Protection Goals to ISO 29101 Privacy 
Reference Architecture 


IT security° and privacy protection are overlapping perspectives when implementing 
IT systems. They both need to be considered already at the level of developing 
underlying architectures. 

Usually, IT security takes the perspective of an organisation, i.e. the objective is 
to safeguard the assets of that organisation. Here the “Classic CIA Triad” of the 
IT security protection goals (confidentiality, integrity and availability) is applied as 
necessary for the specific context. These protection goals are useful to structure risks 
and countermeasures, and to set up a working Information Security Management 
System (ISMS).7 

In contrast, privacy protection focuses on the individuals concerned, 1.e., the Data 
Subjects. Certainly the IT security protection goals confidentiality, integrity and 
availability are important here, too, but they do not represent all areas that should be 
covered when it comes to the privacy of an individual as well as to the compliance 
with today’s data protection regulation.® 

IT security protection goals such as confidentiality, integrity and availability may 
facilitate the implementation of privacy principles into an IT system, but do not suf- 
fice to cover all aspects of privacy protection. For privacy protection, these goals 
need to be complemented with a set of specific protection goals that also allow for 
the expression of mismatches and conflicts of different goals. Even with the three 
classical IT security protection goals, it always has to be determined how much each 
goal should be pursued and what balance between conflicting aspects of those goals 
should be achieved. With the extension to six of those high-level protection goals, 
potential conflicts are more visible, which is good because they have to be tackled 
when designing, operating and improving the IT systems. There is no “one size fits 
all” solution, but for each application context, individual balances and implementa- 
tions have to be determined, dependent on, e.g., the sensitivity of data, the attacker 
model, legacy issues from already existing components of the information system, 
and last but not least, legal obligations. 

To allow for a more holistic mapping of privacy principles, the three IT security 
protection goals are supplemented by three privacy-specific protection goals: trans- 
parency, unlinkability and intervenability, as explained below. A hexagon of protec- 


> Note that we use the term “IT security” in its broad meaning of “information security” covering 
all security aspects of the full information system, regardless, whether technological components 
are involved or not. Among others, organisational processes, data on all kinds of media or the staff 
involved in data processing are part of this comprehensive approach, e.g., when analysing risks or 
selecting and implementing appropriate countermeasures. 

6 CIA stands for Confidentiality, Integrity and Availability, not for the well-known secret service. 
7 For more information see the standards ISO/IEC 27001 and ISO/IEC 27002 as well as ISMS in 
the frameworks ITIL (Information Technology Infrastructure Library) and COBIT (Control Objec- 
tives for Information and related Technology). 


8 Credit for the research underlying this section goes to Martin Rost and Andreas Pfitzmann, see 
also [RP09] and [RB11]. 
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tion goals can be derived where each goal is countered with another one expressing 
dualistic aspects of the protection, see Fig. 25.1 (cf. [RB11]). All protection goals 
can in principle be applied both on the information itself, as well as on the processes, 
and technical layers. For each, the perspective of the Data Controller, the Data Sub- 
ject and a third party can be adopted. Privacy protection goals help to structure risks 
and to define which measurements to apply. 


Fig. 25.1: Segments of security and privacy protection goals. 


To support and develop a common understanding of the aforementioned concepts 
that could only be addressed briefly herein, the terms and definitions above have 
been submitted as a comment from PrimeLife to the drafting of ISO 29101 Privacy 
Reference Architecture. 

In the following, the privacy-specific protection goals are explained: 

Transparency: For all parties involved in privacy-relevant data processing” (specif- 
ically the Data Controller, Data Processor(s), Data Subjects as well as supervisory 
authorities), it is necessary that they are able to comprehend the legal, technical, and 
organisational conditions setting the scope for this processing. Examples for such 
a setting could be the comprehensibility of regulatory measures such as laws, con- 
tracts, or privacy policies, as well as the comprehensibility of used technologies, of 
organisational processes and responsibilities, of the data flow, data location, ways 
of transmission, further data recipients, and of potential risks to privacy. All these 


° The term “privacy-relevant data processing” comprises all kinds of data processing that is or may 
be privacy-relevant, i.e., have some influence on the privacy of individuals. Thereby it intentionally 
chooses a wider approach than “processing of personal data” as it is regulated in European data 
protection law. The term “privacy-relevant data” encompasses at least personal data and potentially 
personal data. 
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parties should know the risks and have sufficient information on potential counter- 
measures as well as on their usage and their limitations. 

Transparency is a necessity for important aspects of informational self-determination, 
such as access rights, informed consent and notification obligations of data proces- 
sors. It can be achieved or enhanced by several mechanisms, such as documentation, 
logging, reporting, data protection management systems as well as information of 
and communication with the Data Subject. 

Unlinkability: Unlinkability means that all privacy-relevant data processing is 
operated in such a way that the privacy-relevant data are unlinkable to any other 
set of privacy-relevant data outside of the domain (or the applicability of a well 
defined purpose), or at least that the implementation of such linking would require 
disproportionate efforts for the entity establishing such linkage. Unlinkability is the 
key element for data minimisation as well as purpose binding. Its objective is to 
minimise risks to the misuse of the privacy-relevant data and to prohibit or restrict 
profiling spanning across contexts and potentially violating the purpose limitations 
related to the data. 

Wherever feasible, Data Controllers, Data Processors, and system developers 
should completely avoid or minimise as far as possible the use and possibilities 
for linkage of privacy-relevant data, conceivably by employing methods for keeping 
persons anonymous, for rendering persons anonymous (“anonymisation’’), or for 
aliasing (“pseudonymisation”). Observability of persons and their actions as well 
as linkability of data to a person should be prevented as far as possible. If privacy- 
relevant data cannot be avoided, they should be erased as early as possible. 

Intervenability: Intervenability aims at the provision of possibilities for Data 
Subjects, Data Controllers as well as supervisory authorities to intervene in all kinds 
of privacy-relevant data processing, where necessary. The objective is to offer cor- 
rective measures and counterbalances in processes. For Data Subjects, intervenabil- 
ity comprises the Data Subject rights to rectification and erasure or the right to file 
a claim or to raise a dispute in order to achieve remedy when undesired effects have 
occurred. For Data Controllers, intervenability allows them to have efficient means 
to control their Data Processors as well as the respective IT systems to prevent unde- 
sired effects. Examples for such means may be the ability to stop a running process 
to avoid further harm or allow investigation, to ensure secure erasure of data in- 
cluding data items stored on backup media, and manually overruling of automated 
decisions or applying breaking glass policies. For supervisory authorities, interven- 
ability could consist of ordering the blocking, erasure or destruction of data, or in 
severe cases stopping the data processing entirely. 

Intervenability can be achieved or supported by mechanisms such as the provi- 
sion of a single point of contact (SPOC). Other approaches include a separation of 
processes, as a means to allow the system to continue to be working, even if there 
is the need for intervention in a specific case. The Data Subject should be offered 
an easy and convenient way to exercise the Data Subject rights to rectification or 
erasure of personal data as well as withdrawing previously given consent. 


25 Contributions to Standardisation 485 


25.3 Web Privacy 


Privacy-enhancing technologies are great consumers of access control technology. 
PrimeLife is in no way an exception here. The early works and research on the 
PrimeLife model and policy engine consequently focused on access control and 
how to organise it. At the same time, with some help of the European Commis- 
sion, a coordination with other projects was organised. The coordination between 
the projects was called “PrimCluster”. Rapidly after the first PrimCluster meetings 
with projects SWIFT,!° TAS3,!! and PICOS,!? it became clear that all projects were 
using and extending the eXtensible Access Control Language (XACML) specified 
by OASIS. Further inquiry in the community revealed that there were more projects 
beyond the ones organised in PrimCluster that had new ideas and innovative exten- 
sions concerning XACML. The topic was brought up in the Policy Languages In- 
terest Group (PLING)! to determine interest from the industry. The response was 
positive. PrimeLife decided to allocate the necessary resources for a Workshop on 
Access Control Application Scenarios that would look specifically at XACML in- 
novations and beyond. W3C organised the workshop as a standardisation workshop 
in November 2009 in Luxembourg. '+ 

As the Web advances toward becoming an application development platform that 
addresses needs previously met by native applications, work proceeds on APIs to ac- 
cess information that was previously not available to Web developers. Work on Web 
Applications and on the Geolocation API for web sites triggered intensive privacy 
discussions. Device APIs providing broad availability of possibly sensitive data col- 
lected through location sensors and other facilities in a web browser is just one 
example of the broad new privacy challenges that the Web faces today. The privacy 
discussion was also brought into PrimeLife for further consideration and to consider 
possible solutions. The dialogue was further broadened by PrivacyOS where several 
stakeholders had first discussions.!> All this together led to the Workshop on Pri- 
vacy for Advanced Web APIs'° in July 2010 in London to discuss the current work 
on the user facing side within a broader audience. 

However, already at the Workshop on Privacy for Advanced Web APIs in London 
it became clear that access control is not enough, neither on the server side nor on 
the client side. While the London Workshop created a community willing to address 
the technical challenges of privacy in the Web context, and while they started to have 


9 “Secure Widespread Identities for Federated Telecommunications”, http://www. 


ist-switt.org/ 

' “Trusted Architecture for Securely Shared Services”, http: //www.tas3.eu/ 
2 “Privacy and Identity Management for Community Services’ http://www. 
picos-project.eu/ 

3http://www.w3.org/Policy/pling/ 

4 All the proceedings, minutes and papers are available under http: //www.w3.org/2009/ 
policy-ws/. 

5 https://www.privacyos.eu/ 


®http://www.w3.org/2010/api-privacy-ws/ 
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lively discussions on a W3C hosted mailing-list,!7 ideas about what to do are clearly 
not shaped yet. W3C organised a further Workshop on Privacy and Data Usage 
Control'® in October 2010 in Cambridge (MA) to encourage further discussions 
on the question of data usage once data has been collected. This again involved 
requirements and expectations from the Device API community. 

In December 2010, the last of the workshops in PrimeLife’s series of events 
dedicated to standardisation was co-organised with the Internet Architecture Board 
(IAB), W3C, the Internet Society ISOC), and Massachusetts Institute of Technol- 
ogy (MIT): the Workshop on Internet Privacy in Boston.'? A broader scope was 
chosen intentionally to discuss upcoming issues in online privacy that need to be 
tackled on a global scale. 

In the following subsections, relevant results from the four workshops are briefly 
described. 


25.3.1 Workshop on Access Control Application Scenarios 


The Workshop on Access Control Application Scenarios”° attracted 20 position pa- 
pers of rather diverse nature. Most of them were presented in the two day workshop 
in Luxembourg and the discussion converged towards four topics: 


Attributes 

Sticky Policies 

Obligations 

Credential-based Access Control 


25.3.1.1 Attributes 


XACML provides a framework for access control systems in heterogeneous IT land- 
scapes. There is a protocol and some basic requirements that are common to all 
access control systems. But XACML does not specify the semantics of the condi- 
tions that have to be fulfilled to grant access. Those semantics are specified by the 
actual implementer within an existing enterprise. This means in order to expand 
to inter-enterprise interoperability or to widen use on an internet scale, XACML 
needs semantics filling out its own framework that makes access control conditions 
predictable and interoperable even where there was no prior agreement on the se- 
mantics of the access control conditions. University Bergamo and University Milano 


htetp://lists.w3.org/Archives/Public/public-privacy/ 
'8 http: //www.w3.org/2010/policy-ws/ 
 nttp://www.iab.org/about /workshops/privacy/ 

0 http: //www.w3.org/2009/policy-ws/ 
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contributed a paper describing extensions to XACML to make it easily deployable 
and suitable for open web-based systems. 

The participants presented their different vocabularies during the workshop. Var- 
ious areas were tackled: Apart from PrimeLife’s privacy vocabulary, work on access 
control in social networking or attribute vocabularies for export control, geospatial 
data and health care data were outlined in the workshop. The chair invited all partic- 
ipants to contribute their semantics to the TC XACML, which could act as a clearing 
house for those ontologies. This way, duplication of attributes could be avoided and 
a cleared vocabulary could be standardised for a wider audience and to achieve some 
basic interoperability for web or inter-enterprise consumption. 


25.3.1.2 Sticky Policies 


Applying access control scenarios beyond the borders of a well-walled enterprise 
does not only raise the question about agreed and interoperable access control se- 
mantics. It also raises the question of how to make sure that all users of a data 
record can respect the access restrictions if this record is travelling around from ser- 
vice to service, across company borders or from continent to continent on the Web. 
One solution is known under the name “Sticky Policy.”?! This means that there is 
a persistent link between the access control information and other metadata and the 
record containing e.g., personal data. A parallel issue exists for Digital Rights Man- 
agement (DRM). There are several co-existing possibilities to organise the “Sticky 
Policies”, e.g., by using a binding as in XML Signature (detached and in line), pos- 
sibly supported by an online data store that contains the bindings, so that the Policy 
Enforcement Point (PEP) could just ask there. 

An additional issue came up while considering that access policies with con- 
ditions travel around. The sending service has a set of policies, but the receiving 
service also already has a certain set of policies (endogenous policies). In practice, 
those policies must be combined in order to compute a concrete result on whether 
access can be granted, or whether the receiving service is able to accommodate the 
requirements from the sending service. It quickly became clear that the combinabil- 
ity of policies turns into a major requirement once more complex distributed systems 
or ad-hoc systems are considered. There are several algorithms already available, but 
none of them is currently standardised. But standardisation of the algorithm of com- 
bination is needed to design policies and systems with predictable results. XACML 
currently provides a built in set of policy combining algorithms, but work is need to 
determine their suitability for this application. 


21 See also Part IV. 
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25.3.1.3 Obligations 


For privacy policies, there are conditions and actions that are not tied to an ac- 
cess control event. For the moment, XACML has an intentionally underspecified 
<Obligation> element that people were using in creative ways. But this underspec- 
ification has the side effect of undermining the interoperability of such obligations. 
Thus, one cannot be sure whether the specified actions are actually performed by 
the receiving service. One of the immediate requirements was that if the receiving 
service does not understand the obligation, it should deny access with a feedback to 
the requester. 

PrimeLife inherited an obligation language from the PRIME project and devel- 
oped it further. This was presented at the workshop. Also, other projects presented 
their work on obligations. Some participants suggested the use of Semantic Web 
technologies and the use of the W3C Rule Interchange Format (RIF). At this early 
stage, it was decided that further work was needed, possibly coordinated among 
several projects that could lead to a suggestion for TC XACML. 


25.3.1.4 Credential-based Access Control 


Credential-based Access Control would allow for a more privacy-friendly access 
control system that would also be more widely usable on the Web. The aim is to 
prove only selected attributes as needed for the task at hand. There is already a large 
set of literature on capabilities, but XACML currently does not have the ability to 
identify the type of credential used nor to specify which credential is needed to get 
access to a certain resource. This is more or less a special case of the attributes topic 
with additional protocol issues. One way to convey the credential would be to use 
SAML, but SAML only allows XML Signature as a proof token. 

Further steps in this direction are already undertaken and the actual PrimeLife 
protocol will be contributed to TC XACML to address credentials as access con- 
trol conditions. But the contribution will also make XACML itself more privacy- 
friendly. Today, if a user hits an access controlled resource, the system simply re- 
turns this resource as restricted. The user then tries as many credentials as she has 
until the resource opens. The XACML 2.0 protocol has no way to tell the user which 
credential it requires to open the access to the desired resource. The PrimeLife ex- 
tension enables the Policy Decision Point (PDP) to convey the type of credential it 
wants already in the response to the initial attempt to access a resource. 


25.3.2 Workshop on Privacy for Advanced Web APIs 


The workshop on Privacy for Advanced Web APIs served to review experiences 
from recent design and deployment work on device APIs, and to investigate novel 
strategies toward better privacy protection on the Web that are effective and lead to 
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benefits in the near term. It focused on work done by the W3C Geolocation Work- 
ing Group, the W3C Device API and their security and privacy considerations. We 
already see new Appstores with applications for our mobile devices. Web appli- 
cations use web technology to provide such applications for desktop and mobile 
devices. Questions from the Web Applications Working Group were getting even 
more emphasis by the W3C Technical Architecture Group’s recent and future work 
on a Web Application Architecture. 

PrimeLife came just in time to help organise this important workshop and also 
used it to distribute some of its results. 

The two practical proposals that drew most interest and discussions were the 
Mozilla privacy icon approach.”” and CDT’s privacy rule-set idea** Both proposals 
received a lot of positive feedback, and questions about their viability. In addition 
to technical and user interface challenges, there were questions about the business 
incentives for browser vendors and large web providers, as one of the main obstacles 
for getting privacy from research and standardisation to deployment. Nevertheless, 
further investigation and experimentation with both approaches seems worthwhile 
and was encouraged. 

There was agreement that it is useful to capture the best current practices gained 
during early implementation efforts (such as those presented during the workshop 
regarding the geolocation API). Furthermore, investigating how to help specification 
writers and implementers to systematically analyse privacy characteristics in W3C 
specifications was seen as a worthwhile effort. The wealth of discussions and the 
enthusiasm of the participants of the Workshop encouraged people to continue the 
dialogue in a mailing list and possible future workshops. 


25.3.3 Workshop on Privacy and Data Usage Control 


As a complement to the considerations on access control and also as a complement 
to the considerations around APIs and the new challenges for web user agents they 
bring, there are also back-end considerations. Service side operations raise privacy 
questions beyond mere database design. How would a service make sure that data 
are used within the boundaries of the promises that had been given to the user and 
maintain the boundaries even if third party services are used to fulfill the user’s 
needs? It becomes immediately clear that the service side of things also has large 
implications for the user agent part of the equation. As a consequence, the Device 
API Working Group presented their list of requirements for privacy and looked for 
possible solutions. 

The workshop revealed that the complexity presented by PrimeLife to the audi- 
ence was not really an issue for the service oriented businesses that typically handle 
large amounts of data within complex systems. It was also clear that extending the 


~ nttp://www.w3.org/2010/api-privacy-ws/papers/privacy-ws-22.txt 
3 http: //www.w3.org/2010/api-privacy-ws/papers/privacy-ws-12.html 
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complex system to the user side of things and to user agents would not work ei- 
ther. What could work is a set of simple semantics in the dialogue with the user 
and her user agent and only use the full complexity of solutions within or between 
privacy-enhanced services. 

It became also clear that there is still a lot of education and communication 
needed. Developers and also sections of the software industry determining proto- 
cols and capabilities are still not sufficiently aware of the fundamental insights and 
goals of privacy. The translation of high level privacy goals that has last been done 
in the 1980ies with the OECD Guidelines and the Census Decision of the German 
Federal Constitutional Court and also subsequently with the Directive 95/46/EC on 
Data Protection into concrete and tangible hints and advises for software develop- 
ment on the Internet and on the Web is still missing. We do not really understand yet 
what the information revolution of the past 20 years has brought. We only start to re- 
alise that the old system of self-determination that tends to become a bean counting 
exercise is not what will help create technical remedies for our everyday life on the 
Web. So a new effort of translation is needed. This means philosophers, technicians 
and lawyers have to reconvene in discussions on what the threats really are, what 
goals can be set and achieved. This suggests further interdisciplinary workshops. 

It is also clear that the topic that is still missing in the discourse we had is the 
economy of privacy. On the Web, personal data are a currency and privacy protection 
is Swimming against the stream of the billions earned by targeted advertisement. So 
one of the questions that will have to be considered is: What framework will be 
needed to encourage investment into privacy tools rather than into lucrative tracking 
tools that augment the return per served ad? 


25.3.4 Workshop on Internet Privacy 


At the Boston Workshop on Internet Privacy in December 2010, the 60 workshop 
participants from enterprise, governments, civil society, academia, as well as var- 
ious standardisation bodies discussed the question “How Can Technology Help to 
Improve Privacy on the Internet?” The objective was to explore conflicting goals of 
openness, privacy, economics, and security to identify a path forward for improving 
privacy. The discussed topics ranged widely, and covered, among others, the trans- 
fer of geolocation data, measurement of degrees of anonymity, private browsing, 
tracking of users via Facebook’s “Like” button, the “Do Not Track” initiative in the 
US and cookies in general. Also, the problem of legal and cultural differences in the 
perception and definition of privacy in our globalised world was approached. 

It should be highlighted that the workshop resulted in an agreement to work 
together in a number of areas within the broader internet technical communities 
such the IAB, W3C, and IETF. 
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25.4 PrimeLife’s Contributions to Standardisation in IETF 


The Internet Engineering Task Force (IETF) is an open international community of 
network designers, operators, vendors, and researchers interested in the evolution 
of the Internet architecture. It is open to any interested individual who can register 
and attend IETF meetings, and can subscribe to and participate in Working Group 
mailing lists. The IETF Mission Statement is published in the Request for Comment 
(RFC) 393574 and states: “The goal of the IETF is to make the Internet work better. 
The mission of the IETF is to produce high quality, relevant technical and engineer- 
ing documents that influence the way people design, use, and manage the Internet in 
such a way as to make the Internet work better. These documents include protocol 
standards, best current practices, and informational documents of various kinds.” 

Apart from increasing discussions among IETF experts and PrimeLife partners, 
the project’s results have been picked up in two early internet drafts: 


e “Privacy Preferences for E-Mail Messages,” i.e., the icon set “Privicons” that 


aims at communicating the sender’s preferences for handling an e-mail to the 
recipients (cf. Section 15.6) and 

e “Terminology for Talking about Privacy by Data Minimization: Anonymity, Un- 
linkability, Undetectability, Unobservability, Pseudonymity, and Identity Man- 
agement.”° that is based on [PH10]. 


It remains to be seen how these documents will evolve and whether ideas from 
these drafts will affect internet standardisation at a later stage. 


25.5 Conclusion and Outlook 


PrimeLife has been involved in several standardisation initiatives, and several part- 
ners will stay active in this field even after the project has ended. However, work on 
global standardisation is a trudge whose efforts are often underestimated. Certainly 
vendors have commercial interest in shaping standards according to their products’ 
needs, and therefore they can usually invest more money and time in the standard- 
isation work than entities that do not get economic benefits. Nevertheless, it is of 
the utmost importance that upcoming standards consider and respect societal values 
and legal principles, even if this means that the results may get more complicated if 
they take into account the varying perspectives of different cultures. 

In the field of privacy and identity management, it is deemed necessary that 
researchers from academia and industry, practitioners from government and en- 
terprises as well as representatives from data protection authorities and also non- 
governmental authorities are empowered and encouraged to contribute to standards 


4 nttp://www.ietf£.org/rfc/rfc3935.txt 
3 nttp://tools.iet£.org/html/draft-koenig-privicons 
26 http: //tools.ietf£.org/html/draft—hansen-privacy-terminology 


492 H. Hedbom, J. Schallabéck, R. Wenning, M. Hansen 


beginning from an early stage. This crucial challenge has to be tackled so that stan- 
dards evolve that are not prone to inhibit human rights such as privacy or non- 
discrimination. 


Chapter 26 
Best Practice Solutions 


Marit Hansen 


26.1 Introduction 


The PrimeLife project has worked in various areas of privacy and identity man- 
agement. Some are mainly relevant for researchers, some for practitioners in the 
application field, and yet others tackle upcoming policy issues that yield recom- 
mendations for policy makers. The following sections point out specific findings 
and results of the PrimeLife project: Firstly, we address industry as being repre- 
sentative for application development and service provisioning. Secondly, we give 
recommendations to policy makers on the European, international or national level. 
Finally we show bits and pieces of PrimeLife’s legacy and sketch possible ways 
where they may be picked up and developed further. Note that we can only present 
a small part of PrimeLife’s outcome here — we had to select some of the most inter- 
esting best practice solutions that serve as example of how PrimeLife’s results are 
potentially valuable for other stakeholders. 


26.2 Recommendations to Industry 


PrimeLife has elaborated various concepts and developed several tools that may be 
of value for industry. In the previous sections, PrimeLife’s results regarding open 
source tools and its initiatives in the field of standardisation have already been de- 
scribed. Both can affect industry in multiple ways. For instance, application devel- 
opers are invited to look at the available open source tools or modules to check 
whether they may be helpful in their own projects. Further, industry may participate 
in the discussion on standardisation related to privacy and identity management. It 
is clear that a few standards will be finalised in the near future — here it can be prof- 
itable for companies to be early adopters and align their products or services to the 
upcoming standards. In the case of mandatory or de-facto standards, the adoption 


J. Camenisch et al. (eds.), Privacy and Identity Management for Life, 493 
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of standards will be essential to guarantee interoperability or compliance. Privacy- 
enhanced application design that is in line with the developed standards can also 
make it easier to be awarded a privacy seal that may serve as competitive advantage. 
In the following, we list a few of PrimeLife’s results and best practice solutions 
together with the stakeholders within industry that can make best use of them. 


26.2.1 Data Minimisation by Pseudonyms and Private Credentials 


Stakeholders: application developers, service providers, IT infrastructure develop- 
ers. 

Since PrimeLife’s preceding project “PRIME - Privacy and Identity Manage- 
ment for Europe’, the use of pseudonyms and private credentials to combine the 
needs of data minimisation and accountability has been stressed. This main theme 
has been proven to be valid also in PrimeLife, and there are a few other applications 
that support working with pseudonyms and even integrate private credentials. Still, 
these concepts have not yet been largely picked up, although the principle of data 
minimisation — disclosing and processing no more personal data than necessary — is 
based on the European data protection law. However, there has been some progress 
in the last years: In addition to the Idemix system from IBM (cf. Section 24.6.1), 
that became part of PrimeLife, related implementations exist that offer account- 
ability while providing data minimisation, in particular the U-Prove system from 
Microsoft and the new German electronic identification card. 

Recommendations for application developers and service providers: Try to min- 
imise personal data, offer pseudonymous or anonymous use of services, integrate 
private credential systems for a combination of data minimisation and accountabil- 
ity. 

Recommendations for IT infrastructure developers: Since many infrastructural 
components work with personal data of users, think of ways to minimise these data. 
This is particularly relevant when setting up identity infrastructures that involve 
authentication or identification of individuals. 


26.2.2 Improvement of Privacy Functionality in Social Media 


Stakeholders: developers and providers of social media. 

Social media have become an extremely successful application area with enor- 
mous growth in numbers of users and exchanged data. Since the essence of social 
media is communication among people and exchange of data related to these people, 
it does not work well to remind users mantra-like of the principle of data minimisa- 
tion. However, improved privacy and identity management functionality for existing 
social media would be more than helpful. This encompasses in particular the pos- 
sibility of audience segregation [Gof59], i.e., the compartmentalisation of different 
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social circles in which users can present themselves differently. This is very much 
related to the principle of purpose binding from the European data protection law or 
to the approach of contextual integrity [Nis04]. Right now, most users of social me- 
dia do not have different accounts at the same service (note that operating multiple 
accounts per person is not encouraged or even excluded in the terms and conditions 
of many social media services), and with the offered access control possibilities, 
the user cannot map the usual variety of an individual’s social circles. In the Prime- 
Life project, the social network Clique! has been developed that demonstrates how 
audience segregation can be implemented and how this functionality can be used. 

Another protection of the user’s content against undesired access can be realised 
by encryption. Within the PrimeLife project, the Firefox extension Scramble!” has 
been developed that can be used in Clique or other social networks. It encrypts 
textual content in a way that only the desired audience in possession of the fitting 
cryptographic key can decrypt it and read the clear text. This form of audience 
segregation has the advantage that the social network provider does not have access 
to the clear text, either. With Clique alone, trust in the social network provider is 
essential because it is technically feasible for the provider to access all content. Note 
that even encryption of the content is not sufficient to protect the users from being 
spied on by the provider of a centralised social network who has the technical ability 
to monitor all log-in and log-out processes and analyse the social graph between the 
nodes in the social network. With the approach of decentralised social networks, 
the amount of knowledge per provider may be reduced, but this is not necessarily 
a panacea as long as there are no guarantees that these smaller providers (and their 
nodes in the social network) are trustworthy. 

Recommendations for developers and service providers of social media: Include 
appropriate privacy functionality in your system by conceptualising and implement- 
ing usable ways of privacy and identity management, in particular offer audience 
segregation. Encourage users to protect themselves, e.g., by encryption of content. 
For all functionality, make clear the privacy implications for the user, e.g., what 
personal data are being disclosed to whom, how the data can be erased, whether a 
function bears irrevocable consequences for the user or other individuals. Follow the 
principle of “privacy by default”, i.e., configure the settings of the system in a way 
that provides most and not least privacy, e.g., prefer “opt-in” by the user concerning 
functionality that may infringe her privacy over “opt-out”. Since most social net- 
works are based on targeted advertising of their users, think of alternative business 
models that do not require the collection and profiling of so much user data. 


1 nttp://www.primelife.eu/results/opensource/64-clique, see also Sec- 
tions 2.2.3 and 24.2.1. 


2 http: //www.primelife.eu/results/opensource/65-scramble, see also Sec- 
tions 2.2.4 and 24.2.2. 
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26.2.3 Better Protection of the User’s Privacy on the Web 


Stakeholders: users, employers. 

Today users are not well informed about how they are tracked on the web, and for 
users who disapprove of being tracked, neither websites nor standard web browsers 
provide sufficient protection against it. Within the PrimeLife project, the Firefox 
extension Privacy Dashboard? has been developed that logs the HTTP traffic to a 
local database on the user’s computer and offers a variety of queries for analysing 
the log entries. Among others, users can see whether the website they are currently 
visiting uses third party content or invisible images. They can also set per site pref- 
erences, e.g., whether to block third party content, persistent cookies, flash cookies 
or scripting. 

Recommendations for users: Firefox users are invited to install the extension so 
that they can be better aware of website tracking and enforce their preferences. Users 
of other browsers should use similar software that protects their privacy. 

Recommendations for employers: Employers in companies or administration 
should make sure that their employees are well protected against unwanted data dis- 
closure on the Internet: Tracking their employees may not only harm their privacy, 
but also may disclose information regarding their work and their organisation that 
should be protected against industrial espionage. Tools such as PrimeLife’s Privacy 
Dashboard can help to achieve a better protection by blocking undesired tracking 
attempts and raising the awareness of users. 


26.2.4 Better Information of Users on Privacy-Relevant Issues on 
the Web 


Stakeholders: service providers in all areas, including e-commerce, social media, 
search engines, comparison shopping sites. 

PrimeLife’s research on privacy policies has shown that usually they are not pre- 
cise enough so that users can really understand the data processing done (or planned) 
by the Data Controller. In addition, many privacy policies contain legalese, which 
makes it difficult and hardly appealing for laypersons, i.e., typical users, to study 
the texts. 

Recommendations for service providers: Service providers should offer more 
readable and more precise privacy policies. They may follow the approach of the 
Art. 29 Working Party to layered policies so that the first, immediately visible short 
notice layer contains the core information, in particular the identity of the Data 
Controller, the purposes of processing and any other information necessary to en- 
sure a fair processing. Users who are interested in more detailed information can 
get more details from the extended versions of the privacy policy in Layer 2 (con- 


3 http://www.primelife.eu/results/opensource/76-dashboard, see also Sec- 
tion 24.4. 
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densed notice) and Layer 3 (full notice) [Par04]. The short notice idea has been 
fleshed out by the “Send Data?” dialogue mock-up developed in PrimeLife that 
shows users all necessary information on the planned data disclosure and possible 
options before transferring these data (cf. Section 14.3). Further, the understand- 
ing of the privacy policies or the user interfaces may be supported by informative 
privacy icons — PrimeLife invested some work to design and evaluate icons con- 
veying information in various contexts (cf. Section 15). For e-mail or other com- 
munication services, specific Privicons (as proposed in cooperation with PrimeLife, 
cf. Section 15.6) can be used and integrated into software such as e-mail clients as 
well as e-mail archiving systems and the related organisational processes. As soon 
as standards on machine-readable policy languages or privacy icons evolve, service 
providers should implement them accordingly. Finally, service providers should bet- 
ter support users in exercising their Data Subject rights, i.e., their rights to access, 
rectify and erase their personal data and also the right to withdraw their consent. In 
PrimeLife, the Data Track was extended by online functions for users to exercise 
their rights as far as granted by the service providers (cf. Section 13.4). Facilitating 
this can contribute to the users’ acceptance and may also help to improve the data 
quality. 

Recommendations for providers of search engines and comparison shopping 
sites: Both search engines and comparison shopping sites play an important role 
in the Web as they function as common entry points or gateways to the services the 
users are looking for. For this reason, providers of search engines and providers of 
comparison shopping sites should offer to evaluate privacy-specific search criteria, 
e.g., whether a given privacy policy is matched. Parts of this matching can already 
be realised by a P3P-enabled website. In addition, search engines or comparison 
shopping sites could evaluate self-statements or third party statements on the natu- 
ral language privacy policy, on mechanisms such as the “Send Data?” dialogue, on 
support by privacy icons, or on awarded privacy seals. 


26.3 Recommendations to Policy Makers 


During the work on the PrimeLife project, it had become apparent that many Data 
Controllers do not meet the standard of the European data protection law — both 
inside and outside the European Union. There are multiple reasons for the lack of 
compliance with data protection law: Many system developers and Data Controllers 
are simply not aware of legal provisions, or they do not know how to implement 
them. For new technologies, it is often not even clear to experts in the field how to 
achieve compliance with the applicable regulation because the legal texts — in some 
cases made more than a decade ago — are phrased in a way that they do not match 
upcoming technologies, business models or people’s usage patterns. 

The PrimeLife project did not aim at comprehensively analysing current gaps 
between law and technology from the privacy point of view. However, within the 
project, a few issues concerning today’s data protection law became apparent, and 
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in particular for the area of privacy and identity management throughout life, a few 
proposals have been elaborated. 
This subsection will deal with recommendations for policy makers 


e concerning clear guidelines for all stakeholders involved in conceptualising, de- 
signing, implementing and operating IT systems that can have privacy-relevant 
aspects, 

e concerning incentives and sanctions to foster data processing compliant with data 
protection law, and 

e concerning PrimeLife-related aspects where development of law should be con- 
sidered. 


In addition, a general recommendation for policy makers is of great importance, 
but not further elaborated in this text: For building an information society that does 
not sacrifice privacy and security, much more education and awareness raising is 
required — for citizens of all ages and skills [ENI08]. 


26.3.1 Clear Guidelines for System Developers and Data 
Controllers 


For improving the level of privacy protection and for better compliance with data 
protection law, both system developers and Data Controllers need clear guidelines 
and an overview on best practices and best available techniques specific to the sector 
of application [ENI08]. This would not only make it easier to assess privacy and 
security issues in internal or external auditing, but could also lead to a reduction of 
privacy and security breaches and diminish the risk to individuals’ privacy for the 
future. 

Especially in PrimeLife’s field of privacy and identity management for life, the 
requirements for designing concepts and technologies are not clear to system devel- 
opers and Data Controllers both in industry and administration. Unlike the social 
and legal systems where society has gathered experiences for several hundreds or 
even thousands of years and could slowly evolve, the technological progress of our 
time makes it hard to keep pace. Who could have predicted the current effects of 
information technologies to our lives only a few decades ago? Planning terms in 
companies often do not go much beyond five years, but we need future-proof solu- 
tions for privacy and identity management that work for 80 or 100 years and cover 
all stages of life [Han10]. 

Recommendations for policy makers: Policy makers and supervisory authorities 
should make clear what they demand from Data Controllers, Data Processors and 
system developers concerning privacy-relevant data processing, i.e., how to inter- 
pret privacy regulation: As designing future-proof solutions for privacy and identity 
management is not an easy task, clear guidelines should be elaborated and pub- 
lished that refine today’s legal and social requirements and enable system developers 
and Data Controllers to implement them accordingly. This encompasses guidelines 
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on how the full lifecycle of identity data has to be considered when designing IT 
systems, how to give sufficient information to Data Subjects and users about all 
privacy-relevant issues, or how delegation processes should be designed [Pri09]. In 
addition, examples for and references to best practices and best available techniques 
should be collected and published. The elaboration and regular update of all these 
guidelines and of the overview of best practices and best available techniques re- 
quires a defined process that involves all relevant stakeholders, in particular the data 
protection authorities. Further, the appropriate infrastructural components should be 
provided [HT10]. 


26.3.2 Incentives and Sanctions 


The current situation where non-compliance with the European data protection stan- 
dard is the rule rather than the exception is also caused by a lack of incentives for 
Data Controllers and, related, the rare occasion of impressive sanctions in case of 
complaints by Data Subjects or audits by data protection authorities in the European 
Union. 

Recommendations for policy makers: Policy makers should consider incentives 
that encourage Data Controllers to comply with the data protection law and — even 
better — advance the state-of-the-art in privacy technology. This can be supported by 
trustworthy certification schemes and in-depth audits. Policy makers should revise 
the current framework for sanctioning privacy infringements and providing reme- 
dies for victims. Data protection authorities should be empowered to cover a signif- 
icant share of Data Controllers with inspections and impose noticeable punishments 
in case of privacy infringements. 

Moreover, the handling of complaints or of exercising the Data Subject rights 
should be improved, in particular concerning cross-border data flow. Providing stan- 
dard forms for complaints or for exercising Data Subject rights in different lan- 
guages, at least harmonised on the European level, could help in this respect. Fur- 
ther, the workflow for dealing with such requests both by Data Controllers and by 
data protection authorities could be standardised. This could also enable better sup- 
port by information technologies and user interfaces of identity management sys- 
tems as proposed by PrimeLife — culminating in possible online functions to lower 
the threshold for users to exercise their Data Subject rights. 


26.3.3 Development of Law 


The European Data Protection Directive was enacted in 1995 when the information 
society was in the early stages of development. Whereas the values concerning pri- 
vacy and data protection that determine the legal text are still valid and seem to be 
accepted by a majority of citizens at least throughout Europe, the methods of data 
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processing have changed so much that legal provisions of that time are not always 
helpful in today’s situations. 

One example is the area of social media that PrimeLife dealt with. It is charac- 
terised by a huge growth of the amount of exchanged (personal) data and of partic- 
ipating users: Are all of them aware of what happens to personal data disclosed via 
social media? Are these social media really based on informed consent by the users 
with a reliability of expectations, and what happens if users would like to withdraw 
their previously given consent? And if there are difficulties: should the concept of 
consent not be revised, e.g., to limit irrevocable or unexpected consequences? Even 
if individuals who may disclose data on their acquaintances do not become Data 
Controllers in the sense of the data protection law, which would require them to 
meet manifold requirements,* how can privacy breaches concerning the personal 
data of the acquaintances be effectively prevented? How do we prevent or deal with 
the risk that very few social network providers, who managed to make their sites 
central entry points for all web activities of many individual users, have access to 
so much data on almost all areas of life of their members (and in some cases: also 
of non-members)? This also applies to other single-point-of-entry pages that are 
offered (and often voluntarily used), e.g., search engines. 

Recommendations for policy makers: Policy makers should demand “privacy by 
default’, e.g., better information of users and pre-sets for all IT systems that only 
minimal personal information is disclosed or transferred. They should rethink the 
concept of consent and possibly limit data processing based on consent in its scope 
or extent (e.g., consider expiry of consent after one year as a default). Further, pol- 
icy makers should make clear that data processing based on consent of the Data 
Subject requires the Data Controller to outline the consequences of the consent, in 
particular to show what is revocable and what is irrevocable and how it is possible 
to revoke that consent. They should define and explicate areas where irrevocable 
consequences are limited or prohibited at all. Policy makers should further limit 
exemptions to use (potentially) personal data for other purposes and be extra cau- 
tious with sensitive data. Moreover, policy makers should seek ways to efficiently 
implement fair user control (e.g., exercising Data Subject rights in an easy and con- 
venient way possible for all individuals). This could be supported by establishing an 
international warning system for specific risks or breaches, similar to governmental 
travel warnings for dangerous regions. 

In addition, PrimeLife has worked on privacy and identity management through- 
out life. In principle, today’s legal provisions should hold when it comes to the need 
for maintaining privacy for the full lifetime and in all stages of life, but there is 
room for improvement. In particular, the instrument of delegation to exercise Data 
Subject’s rights should be recognised and explicated by law — this is necessary to 
cover all stages of life and achieve a fair balance between the interests of all parties 
involved. 


4 In many cases the so-called household exemption applies: The Data Protection Directive 
95/46/EC does not impose the duties of a Data Controller on an individual who processes per- 
sonal data “in the course of a purely personal or household activity” [Par09]. 
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Recommendations for policy makers: For legally relevant settings, policy mak- 
ers should regulate the circumstances of expressing and revoking delegation. Ad- 
ditionally, they should define general principles or guidelines for delegation that 
balance the interests, rights and duties of the parties involved in delegation, e.g., 
the person concerned, the delegates, the delegators, or other communication part- 
ners. Further, policy makers should provide prerequisites to enable later revision of 
privacy-relevant actions performed by the delegate on behalf of a person concerned. 
Regarding the protection of minors also in the area of privacy protection, the right 
of young persons to privacy including the right to exercise the Data Subject rights 
should be explicated in the law. 

Lifelong privacy protection also means that policy makers can react to alterations 
and upcoming challenges in our changing world. Here ex ante privacy assessments 
of technical, regulatory, and legislatory advancements could be helpful. Further, 
there should be better precautions against risks and ways to deal with them, e.g., 
by strengthening the principles of data minimisation (including a “right to obliv- 
ion”) and user control as well as contextual integrity to prevent information being 
taken out of the context [BPPB11]. Similarly, in addition to the traditional IT secu- 
rity protection control objectives, i.e., confidentiality, integrity, and availability, spe- 
cific privacy protection control objectives should be considered and implemented as 
appropriate, namely transparency, unlinkability and intervenability [RPO9], cf. Sec- 
tion 25.2.2. 

Recommendations for policy makers: Policy makers should monitor changes in 
society, law and technologies and react appropriately, e.g., by evaluating chances 
and risks, adapting current processes, regulations or standards to the changed con- 
ditions, etc. On a micro-level, such changes that may be relevant for the individuals’ 
privacy can be mergers and acquisitions. Here, policy makers should consider better 
protection of Data Subjects and better information in case of mergers and acquisi- 
tions, in particular in cross-border mergers or mergers in third countries. Further- 
more, upcoming proposals to incorporate contextual integrity as well as privacy- 
specific control objectives into law should be evaluated. 

Finally, legal provisions may also hinder the employment of privacy-enhancing 
technologies. Think of private credentials that enable the disclosure of less personal 
data and can even prevent undesired linkage. However, there are several laws, es- 
pecially in the governmental sector, that specify exactly which data are to be col- 
lected from the Data Subjects because this has been traditionally done that way for 
many years. This specified amount of data may have been reasonable in paper-based 
workflows, and now this experience has been transferred to the digital world. Still, it 
would be sufficient — and demanded by the data minimisation principle — if not the 
exact data fields, but only attributes were processed, e.g., not the exact birth date, 
but the proof that the Data Subject is at least 18 years old. In this example, the legal 
provisions themselves cement a not at all data minimising process design that could 
be improved if the appropriate infrastructure was provided. 

Recommendations for policy makers: Policy makers should evaluate current legal 
provisions in the light of private credentials. In addition, they should support setting 
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Chapter 27 
PrimeLife’s Legacy 


Jan Camenisch and Marit Hansen 


A project’s legacy consists of three main parts: the product legacy, the process 
legacy, and the people legacy [CHM03]. Most parts of this book deal with the 
product legacy, 1.e., the tangible outcome of PrimeLife in the form of prototypes, 
demonstrators, the code base, research papers, contributions to standardisation ini- 
tiatives, heartbeats and deliverables. In addition, project flyers, presentations, a large 
body of scientific publications, PrimeLife’s website, and other ways of managing the 
project’s knowledge belong to the product legacy. 

The process legacy encompasses the process knowledge of how the project’s 
objectives were achieved and how the results were elaborated. This includes im- 
proved capabilities for successfully and efficiently conducting or participating in 
future projects. All partners in the PrimeLife consortium have set up the appropri- 
ate workflows in their organisations to work jointly together and produce results 
that contribute to the common vision. They have learned how to present and explain 
PrimeLife’s results to different kinds of audiences, to exchange valuable information 
at conferences, at exhibitions such as the European ICT Events or the international 
CeBIT fair, and at meetings with other projects or cluster events. 

While process legacy can be put down in writing and can be implemented in 
workflows, people legacy addresses the gained contacts of individual team mem- 
bers as a result of successful networking as well as tacit expert knowledge gathered 
throughout the project’s lifetime. Among others, this knowledge has been built up 
because all members of the PrimeLife team have practised throughout the project 
how to work together among several disciplines, how to find a common language, 
how to bridge cultural differences, and how to deliver results that meet each other’s 
expectations. This is not only true for the people involved in each partner organisa- 
tion for the full duration or parts of it, but also for the members of the PrimeLife 
Reference Group, for attendants of the two PrimeLife Summer Schools or for fur- 
ther participants of the workshops PrimeLife has organised. The networking effects 
of such projects should not be underestimated, because they support further evolve- 
ment of PrimeLife’s stimuli that may come into blossom only a few years later. 
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One of the spin-off projects of PrimeLife has already started: “ABC4Trust — 
Attribute-based Credentials for Trust”! will work from 2010 to 2014 on using data 
minimising certificates such as IBM’s identity mixer (idemix) credentials and Mi- 
crosoft’s U-Prove system in identity pilots. Also, we already know that results from 
the PrimeLife project will find their ways in some of the forthcoming Future Inter- 
net Public Private Partnership projects funded by the EC. Other communities formed 
within the project will continue to exist, e.g., the core team of the Summer School 
will try to organise similar events after the project has ended. Finally, the individual 
participants of the PrimeLife project will be ready to answer questions concerning 
their specific field of research and participate in further discussions and new smaller 
and larger projects to promote privacy and identity management as PrimeLife un- 
derstands it. 
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